Multiarch Architecture Specifiers (Tuples)
This document describes an abandoned proposal to introduce a new schema for representing architecture ABIs as directory names. Further development of multiarch is proceeding using normalized GNU triplets.
This page is part of the larger effort to implement Multiarch library handling on Linux. For more information about Multiarch, see Multiarch, https://wiki.ubuntu.com/MultiarchSpec.
For notes on the BoF held at DebConf 10 on this topic, see Multiarch/DebConf10notes.
Contents
The problem
Multiarch specifies a method of making libraries for multiple, mutually-incompatible architectures installable on a single filesystem in a manner that ensures the same binaries can be used without modification on any system. To accomplish this, we require unique identifiers for each architecture that identifies an incompatible set of libraries that we want to be co-installed. We want these identifiers to be specified in as vendor-neutral a manner as possible, to ensure we retain maximal binary compatibility across Linux distributions.
Why not use GNU triplets?
Earlier proposals for multiarch did use GNU triplets as the components of the library path. Proof-of-concept implementations ran into two distinct cases where GNU triplets could not be used effectively in a cross-distribution manner:
- On IA-32, the GNU triplet varies according to the precise instruction set being targeted: e.g., i486-linux-gnu, i586-linux-gnu i686-linux-gnu. Such triplets will be inconsistent over time within a single distribution as compiler defaults are changed, let alone between distributions.
On ARM, the canonical GNU triplet for EABI systems, whether using hard-float or soft-float, is arm-linux-gnueabi, despite the fact that hard-float and soft-float libraries use different calling conventions and can't be intermixed. The advice given by upstream GCC developers was to use the vendor field in the GNU triplet to express this difference1; however, the vendor field is private by design, making this unreliable for cross-distribution use.
Proposed solution
Given this imperfect mapping to GNU triplets, we recommend the introduction of an entirely new tuple scheme to use as the directory structure, rather than confusing matters by using triplets for most architectures and diverging from them for a smaller number of architectures. We propose an initial set of tuples for immediate use. The tuple scheme should be maintained by a neutral standards body, such as the LSB WG.
The use of a tuple that encodes multiple axes on which binary incompatibility can be introduced is a sound one. We propose a hierarchical scheme of tuples of the general form:
<cpu name>-<syscall abi>-<calling conventions and libc abi>
We say that the scheme is hierarchical because the meaning of subsequent terms is dependent on the preceding terms; for instance, a linux syscall ABI refers to a different syscall table on an x86_64 cpu than it does on a PowerPC cpu. However, when terms are reused they should be used in a generally consistent manner; linux should be used to describe the preferred syscall interface of the Linux kernel on the named CPU at the time of adoption, it should not be used to describe a BSD emulation mode.
CPU name
The first field encodes information about the machine type, word size, and endianness. Each value of this field implies a particular base instruction set; however, there is no requirement that libraries installed to a directory referencing a given CPU name limit be limited to code that runs on machines implementing only this base instruction set. For example, a CPU value of x86 refers to the instruction set first implemented on the 80386 processor, but most current implementations will make use of instructions first made available in the P5 and P6 processors; few if any will run on true 80386 processors. In this way, enabling the use of optional instruction set extensions should not require a different multiarch tuple. Information about such extensions can instead be encoded using systems such as glibc's hwcap mechanism.
However, in a case where a member of a processor family lacks support for instructions that are considered "legacy" or otherwise implements support for only a subset of the instructions that are part of the base CPU definition, and there is any need to support code both with and without the use of these instructions (which will normally be the case), a new CPU identifier should be defined. Possible examples are: ARM vs. Thumb-2; PowerPC vs. PowerPC SPE; and Motorola 68k vs. Coldfire 68k.
An initial set of values for this field is proposed below.
value
instruction set
endianness
wordsize
arm
ARM
little
32
armeb
ARM
big
32
alpha
Alpha
little
64
avr32
AVR32
big
32
hppa
PA-RISC
big
32
ia64
IA-64
little
64
m32r
Renesas M32R
big
32
m68k
Motorola 68k
big
32
mips
MIPS
big
32
mips64
MIPS64
big
64
mipsel
MIPS
little
32
mips64el
MIPS64
little
64
ppc
PowerPC
little
32
ppc64
PowerPC
little
64
ppc_spe
PowerPC e500 (SPE)
little
32
s390
System/390
big
32
s390x
z/Architecture
big
64
sh3
SH-3
little
32
sh3eb
SH-3
big
32
sh4
SH-4
little
32
sh4eb
SH-4
big
32
sparc
SPARC
big
32
sparc64
SPARC64
big
64
x86
IA-32
little
32
x86_64
x86_64
little
64
Syscall ABI
The second field encodes information about the syscall ABI of the implementation. That includes information such as the order and meaning of syscalls, as well as their calling convention. In most cases, there will be a one-to-one mapping between syscall ABIs and implementing kernels and we recommend using the kernel name as the value of the field. In other cases, a kernel implementation for a given CPU family may have gone through multiple syscall ABIs over time, or even simultaneously support multiple ABIs that are selectable at runtime (called "personalities" in Linux). In those cases, the kernel name will be used as the identifier for the predominant or default syscall ABI in use at the time of adoption and variants of the kernel name may be used for other supported syscall ABIs as necessary.
value
CPU name
personality
description
linux
*
linux
linux
arm,armeb
linux32
ARM linux syscalls from 2.6.16 onwards
linuxapcs
arm,armeb
linux
ARM syscalls default prior to version 2.6.16; CONFIG OABI afterwards)
freebsd
*
gnu
*
Hurd
Calling conventions and libc ABI
The final field encapsulates the user space ABI: the ABI of the standard C library (set of functions implemented, type definitions, base symbol versions supported) and the register calling convention for inter-library functions.
It is assumed that all relevant implementations will have a corresponding C library, as without a standard library there is little value in producing shared libraries for a target and therefore no need for these to be co-installable on the filesystem. Nevertheless, the possible inclusion of libc-less targets in this schema influenced the decision to encode endianness in the CPU name field, even though in some cases a processor family may be endianness-agnostic.
As with syscall names, the same name will generally be used for a given libc implementation across multiple hardware architectures where applicable. Mutually incompatible ABIs provided by a given C library within a single architecture may commonly be represented as variations on the base name of the C library.
In the case of the GNU C library, the name "glibc" is preferred over "gnu" to avoid confusion between GNU triplets and multiarch tuples.
value
CPU name
syscall ABI
description
glibc
*
*
glibc as provided by GNU upstream; also, eglibc when built in full compatibility mode
glibc
mips64
*
glibc, MIPS n64 ABI
glibc_n32
mips64
*
glibc, MIPS n32 ABI
glibc
arm
linux
glibc, ARM EABI with soft-float calling convention
glibc_hf
arm
linux
glibc, ARM EABI with hard-float calling convention
uclibc
*
*
uclibc
Supporting interfaces
To avoid divergent embedded implementations of architecture->tuple mappings in every piece of software that wants to make use of these paths, a standard commandline tool that encapsulates this mapping should be provided, similar to the config.guess and config.sub tools that are a standard part of GNU autotools. We recommend a name of lsb_architecture for this tool.
The tool should accept no commandline arguments and instead interrogate the configured build environment, respecting standard environment variables such as CC, CPPFLAGS and CFLAGS. It may also respect distribution-specific environment settings, such as DEB_HOST_ARCH. On success, the string representation of the multiarch tuple will be returned on stdout and the command will exit 0. On a failure to determine a tuple for the target, the command will exit non-zero and an explanatory error message may be returned on stderr.
A preliminary reference implementation is provided here.
Endorsements
This proposal is submitted to the LSB WG for consideration by members of the Debian and Ubuntu communities:
- Steve Langasek
- Aurelien Jarno
- Wookey
- Loïc Minier