== Summary == This specification outlines a system-wide benchmark that can be used for determining performance of changes, and possible libraries to target for increasing performance by recompiling specific libraries with new CFLAGS. == Rationale == Performance on Debian ARM (and other architectures) can likely be improvement by specific optimizations of certain libraries. == Assumptions == * That performance is currently limited by a handful of bottlenecks in the base system * That we can provide both optimized and non-optimized libraries without heavy issue. == Design == * Creation of a system wide benchmarking procedure (based on oprofile?) == Implementation == === Code Changes === Code changes should include an overview of what needs to change, and in some cases even the specific details. == Test/Demo Plan == === Benchmark Process === Part of determining on which libraries that should be optimized requires identifying bottlenecks in the system caused by either a lack of floating point support, unaligned access trapped and repaired by the kernel, or code that executes slowly on ARM. Creating a benchmark to determine current performance faults lay is a necessity. Requirements of this benchmarking procedure: * Must be based off a stable release (attempting to benchmark off a moving target would be difficult, and produce variable results) * Based around a common piece of hardware most of the Debian/armel team has * Must be able to show differences from change to another to show changes that may not be immediately obvious * Must be able to take in account issues caused by X11 latency and the like Ideas for benchmarking tools: * oprofile * system wide profiler, can show time spent in individual libraries and given a generally good idea on where to look on places to optimize, and give a clear indication if an individual library optimization will help signficantly * bootchat * boot-time chart of processes executed and time spent running each process. Can give a good idea if a core system library is optimized, if it will have a noticeable improvement on boot time == Unresolved issues == This should highlight any issues that should be addressed in further specifications, and not problems with the specification itself; since any specification with problems cannot be approved. == BoF agenda and discussion == Use this section to take notes during the BoF; if you keep it in the approved spec, use it for summarising what was discussed and note any options that were rejected. == typical arm performance/porting issues == * floats * alignment * structs * char * blending (-mtune cortex-a8 + -march=armv4t can cause unexptected performance) == relevant arm instructions == * ARMv5 * pld (not really relevant, but emitted a lot) * clz * ARMv6 * SIMD instructions - useful for multimedia applications * ARMv7 * udiv/sdiv in thumb code, not available in armv7-a application profile * thumb * vfp * a lot faster floats * even faster in runfast mode * performance effect can be negated by using softfp ABI which uses slow vfp to generic register moves * vfp3 * neon * effectively needs handcoded asm * autovectorizing neon in gcc in future * gcc * Code effecienty has dropped over time * Can't intermix compile-time options within a library * ABIs remain the same, you can have the same ABI between two libraries targetting/tuned for a different platform ==== Draft Benchmarking Draft Notes ==== * Stable Base for Benchmarking * Based off latest stable release * Have the base system, and a standardized set of packages installed; we're not benchmarking the entire system, but just the most common core packages; think GNOME/KDE/Xfce/server/etc. * ---- CategorySpec