Differences between revisions 10 and 18 (spanning 8 versions)
Revision 10 as of 2022-09-05 14:27:45
Size: 5020
Comment:
Revision 18 as of 2022-09-07 04:41:07
Size: 6852
Editor: PaulWise
Comment: enhance getauxval item slightly
Deletions are marked like this. Additions are marked like this.
Line 12: Line 12:
 * Lack of so called SIMD instruction (vectorized maths). In this case [[SIMDEverywhere]], [[https://github.com/DLTcollab/sse2neon|sse2neon]] and similar projects allow building code written using one instruction target (such as x86 SSE) to work on another instruction set (such as ARM NEON) or even just the architecture baseline. When combined with runtime instruction selection this can greatly increase portability of code that was written with one target in mind, often in a way that patches can be sent and accepted upstream (see, for example [[https://github.com/microsoft/SPTAG/compare/main...pabs3:SPTAG:use-simd-everywhere|the patch that adds SIMDEverywhere to SPTAG]]).  * Lack of so called SIMD instruction (vectorized maths). In this case [[SIMDEverywhere]] and similar projects allow building code written using one instruction target (such as x86 SSE) to work on another instruction set (such as ARM NEON) or even just the architecture baseline. When combined with runtime instruction selection this can greatly increase portability of code that was written with one target in mind, often in a way that patches can be sent and accepted upstream. There are several [[SIMDEverywhere#Packages_Status|example patches]] from packages that have switched to SIMDe.
Line 23: Line 23:
The DebianMan:getauxval function can also be used for this, using FIXME: the AT_HWCAP/AT_HWCAP2/AT_PLATFORM/AT_BASE_PLATFORM option. The DebianMan:getauxval function ([[https://lwn.net/Articles/519085/|LWN article]]) can also be used for this, using the AT_HWCAP/AT_HWCAP2 options. The {{{LD_SHOW_AUXV=1 sleep 0}}} command can print all the getauxval results at the command-line.
Line 25: Line 25:
Please note that using /proc/cpuinfo is not realiable, particularly in multiarch context and using qemu-user. Please note that using /proc/cpuinfo is not reliable, particularly in multiarch context and when using qemu-user.
Line 30: Line 30:

=== Manual ===

You can write your own ifunc that will be run at program start to automatically resolve functions to the right instruction target. This could be a useful replacement for the target attribute (see below) where it isn't supported.
Line 78: Line 82:
The Debian Med team has a [[https://sources.debian.org/src/scrappie/latest/debian/bin/simd-dispatch|simd-dispatch]] script that is a good example of this approach. The Debian Med team has a [[https://sources.debian.org/src/scrappie/latest/debian/bin/simd-dispatch|simd-dispatch]] script that is an example of this approach. In future the isa-support source package may provide a way to do this in a simpler way.
Line 80: Line 84:
= Partial architectures =

In theory it could be possible to build packages for increased baselines with different architecture names and allow users to install select packages where the performance difference is noticeable. SIMDebian [[https://github.com/SIMDebian/dpkg/commit/13b062567ac58dd1fe5395fb003d6230fd99e7c1|took this approach]].
Please note that using /proc/cpuinfo is not reliable, particularly in multiarch context and when using qemu-user.
Line 86: Line 88:
DebianPackage:isa-support allows preventing installation of packages that require instructions that are not available on the current CPU. The binary packages of [[DebianPackage:src:isa-support]] allow preventing installation of packages that require instructions that are not available on the current CPU.
Line 88: Line 90:
In future it may also provide a way to allow installation but block running programs that require instructions that are not available on the current CPU. Packages can allow installation but instead have scripts or programs that check the CPU they are currently running on and print an error message to stderr or show an error message using a graphical tool. The [[https://sources.debian.org/src/chromium/latest/debian/scripts/chromium/?hl=28#L28|chromium wrapper script]] is an example of this approach. In future the isa-support source package may provide a way to do this in a simpler way.

Please note that using /proc/cpuinfo is not reliable, particularly in multiarch context and when using qemu-user.

= Architectures =

== Partial ==

In theory it could be possible to add additional architectures with increased baselines with different architecture names that only build select packages where the performance difference is noticeable. SIMDebian [[https://github.com/SIMDebian/dpkg/commit/13b062567ac58dd1fe5395fb003d6230fd99e7c1|took this approach]].

== Change baselines ==

Increasing the CPU requirements of an architecture can increase its performance on newer or more capable hardware while preventing the port from being used on older or less capable hardware. Decreasing the CPU requirements does the opposite of course. Benchmarks are needed to determine exactly how useful a CPU requirements change could be.

= Build flags =

Users can use Debian build flags to rebuild individual packages for their own use. See the DebianMan:dpkg-buildflags manual page for the override mechanism and the compiler documentation for which build flags to select. Some packages will not properly inject the new flags into the build system, so inspect the build logs to find out if they do that correctly and file bugs for packages that don't inject build flags correctly. Benchmarks are needed to determine exactly how useful a build flags change could be.

Debian generally builds binaries for the baseline instruction set of each architecture.

Often upstream projects want to require to newer instruction targets, which means that binaries won't work on older systems.

There are several options to resolve these sorts of conflicts. These changes are often suitable for upstream but can also be done just in Debian with a small amount of maintenance needed as upstream code changes.

Porting between targets

The lack of suitable instruction target is broadly separated in two cases:

  • Lack of so called SIMD instruction (vectorized maths). In this case SIMDEverywhere and similar projects allow building code written using one instruction target (such as x86 SSE) to work on another instruction set (such as ARM NEON) or even just the architecture baseline. When combined with runtime instruction selection this can greatly increase portability of code that was written with one target in mind, often in a way that patches can be sent and accepted upstream. There are several example patches from packages that have switched to SIMDe.

  • Lack of atomic instruction or SMP instruction. In this case libatomic will help and improve upstream portability.

Runtime selection

Manual

Your code can check the CPU it is currently running on and run the appropriate code based on the available instructions.

On x86 the CPUID instruction is used for this, but the __builtin_cpu_supports builtin function simplifies this.

The getauxval function (LWN article) can also be used for this, using the AT_HWCAP/AT_HWCAP2 options. The LD_SHOW_AUXV=1 sleep 0 command can print all the getauxval results at the command-line.

Please note that using /proc/cpuinfo is not reliable, particularly in multiarch context and when using qemu-user.

Function multi-versioning

Function multi-versioning (LWN article) involves using a compiler-supplied ifunc (more) at program start to automatically resolve functions to the right instruction target.

Manual

You can write your own ifunc that will be run at program start to automatically resolve functions to the right instruction target. This could be a useful replacement for the target attribute (see below) where it isn't supported.

target_clones attribute

The target_clones attribute can allow you to compile one implementation of a function for multiple instruction targets and then select the best one at runtime:

__attribute__((target_clones("avx2", "default")))
int foo(){
  return 1;
}

This is supported for C and C++ source files in GCC 6+ and clang 14+ compilers.

In theory it should be possible to use #ifdef in target_clones functions in C source files in GCC to get different implementations for different targets, but the necessary changes have not been implemented.

target attribute

The target attribute can allow you to write independent implementations of a function for multiple instruction targets and then select the best one at runtime:

__attribute__ ((target ("default")))
int foo ()
{
  return 0;
}

__attribute__ ((target ("sse4.2")))
int foo ()
{
  return 1;
}

This is supported for C++ source files in GCC 4.8+ and clang 8+ compilers.

In theory this could be supported for C source files in GCC but the necessary changes have not been added yet.

hwcaps

The hwcaps feature allows you to build an entire library for multiple instruction targets, install them into a different directory for each target and then select the best one at runtime.

This is supported for ELF libraries in glibc and version 2.33 improved this mechanism significantly.

scripts

Writing scripts allows you to build an entire program for multiple instruction targets, install them into a different directory for each target and then select the best one at runtime.

The Debian Med team has a simd-dispatch script that is an example of this approach. In future the isa-support source package may provide a way to do this in a simpler way.

Please note that using /proc/cpuinfo is not reliable, particularly in multiarch context and when using qemu-user.

Blocking unsupported systems

The binary packages of src:isa-support allow preventing installation of packages that require instructions that are not available on the current CPU.

Packages can allow installation but instead have scripts or programs that check the CPU they are currently running on and print an error message to stderr or show an error message using a graphical tool. The chromium wrapper script is an example of this approach. In future the isa-support source package may provide a way to do this in a simpler way.

Please note that using /proc/cpuinfo is not reliable, particularly in multiarch context and when using qemu-user.

Architectures

Partial

In theory it could be possible to add additional architectures with increased baselines with different architecture names that only build select packages where the performance difference is noticeable. SIMDebian took this approach.

Change baselines

Increasing the CPU requirements of an architecture can increase its performance on newer or more capable hardware while preventing the port from being used on older or less capable hardware. Decreasing the CPU requirements does the opposite of course. Benchmarks are needed to determine exactly how useful a CPU requirements change could be.

Build flags

Users can use Debian build flags to rebuild individual packages for their own use. See the dpkg-buildflags manual page for the override mechanism and the compiler documentation for which build flags to select. Some packages will not properly inject the new flags into the build system, so inspect the build logs to find out if they do that correctly and file bugs for packages that don't inject build flags correctly. Benchmarks are needed to determine exactly how useful a build flags change could be.