Summary
This specification outlines a system-wide benchmark that can be used for determining performance of changes, and possible libraries to target for increasing performance by recompiling specific libraries with new CFLAGS.
Rationale
Performance on Debian ARM (and other architectures) can likely be improvement by specific optimizations of certain libraries.
Assumptions
- That performance is currently limited by a handful of bottlenecks in the base system
- That we can provide both optimized and non-optimized libraries without heavy issue.
Design
- Creation of a system wide benchmarking procedure (based on oprofile?)
Implementation
Code Changes
Code changes should include an overview of what needs to change, and in some cases even the specific details.
Test/Demo Plan
Benchmark Process
Part of determining on which libraries that should be optimized requires identifying bottlenecks in the system caused by either a lack of floating point support, unaligned access trapped and repaired by the kernel, or code that executes slowly on ARM. Creating a benchmark to determine current performance faults lay is a necessity.
Requirements of this benchmarking procedure:
- Must be based off a stable release (attempting to benchmark off a moving target would be difficult, and produce variable results)
- Based around a common piece of hardware most of the Debian/armel team has
- Must be able to show differences from change to another to show changes that may not be immediately obvious
- Must be able to take in account issues caused by X11 latency and the like
Ideas for benchmarking tools:
- oprofile
- system wide profiler, can show time spent in individual libraries and given a generally good idea on where to look on places to optimize, and give a clear indication if an individual library optimization will help signficantly
- bootchat
- boot-time chart of processes executed and time spent running each process. Can give a good idea if a core system library is optimized, if it will have a noticeable improvement on boot time
Unresolved issues
This should highlight any issues that should be addressed in further specifications, and not problems with the specification itself; since any specification with problems cannot be approved.
BoF agenda and discussion
Use this section to take notes during the BoF; if you keep it in the approved spec, use it for summarising what was discussed and note any options that were rejected.
typical arm performance/porting issues
- floats
- alignment
- structs
- char
- blending (-mtune cortex-a8 + -march=armv4t can cause unexptected performance)
relevant arm instructions
- ARMv5
- pld (not really relevant, but emitted a lot)
- clz
- ARMv6
- SIMD instructions - useful for multimedia applications
- ARMv7
- udiv/sdiv in thumb code, not available in armv7-a application profile
- thumb
- vfp
- a lot faster floats
- even faster in runfast mode
- performance effect can be negated by using softfp ABI which uses slow vfp to generic register moves
- vfp3
- neon
- effectively needs handcoded asm
- autovectorizing neon in gcc in future
- gcc
- Code effecienty has dropped over time
- Can't intermix compile-time options within a library
- ABIs remain the same, you can have the same ABI between two libraries targetting/tuned for a different platform
Draft Benchmarking Draft Notes
- Stable Base for Benchmarking
- Based off latest stable release
- Have the base system, and a standardized set of packages installed; we're not benchmarking the entire system, but just the most common core packages; think GNOME/KDE/Xfce/server/etc.