Multiarch Architecture Specifiers (Tuples)

This document describes an abandoned proposal to introduce a new schema for representing architecture ABIs as directory names. Further development of multiarch is proceeding using normalized GNU triplets.

This page is part of the larger effort to implement Multiarch library handling on Linux. For more information about Multiarch, see Multiarch, https://wiki.ubuntu.com/MultiarchSpec.

For notes on the BoF held at DebConf 10 on this topic, see Multiarch/DebConf10notes.

The problem

Multiarch specifies a method of making libraries for multiple, mutually-incompatible architectures installable on a single filesystem in a manner that ensures the same binaries can be used without modification on any system. To accomplish this, we require unique identifiers for each architecture that identifies an incompatible set of libraries that we want to be co-installed. We want these identifiers to be specified in as vendor-neutral a manner as possible, to ensure we retain maximal binary compatibility across Linux distributions.

Why not use GNU triplets?

Earlier proposals for multiarch did use GNU triplets as the components of the library path. Proof-of-concept implementations ran into two distinct cases where GNU triplets could not be used effectively in a cross-distribution manner:

Proposed solution

Given this imperfect mapping to GNU triplets, we recommend the introduction of an entirely new tuple scheme to use as the directory structure, rather than confusing matters by using triplets for most architectures and diverging from them for a smaller number of architectures. We propose an initial set of tuples for immediate use. The tuple scheme should be maintained by a neutral standards body, such as the LSB WG.

The use of a tuple that encodes multiple axes on which binary incompatibility can be introduced is a sound one. We propose a hierarchical scheme of tuples of the general form:

We say that the scheme is hierarchical because the meaning of subsequent terms is dependent on the preceding terms; for instance, a linux syscall ABI refers to a different syscall table on an x86_64 cpu than it does on a PowerPC cpu. However, when terms are reused they should be used in a generally consistent manner; linux should be used to describe the preferred syscall interface of the Linux kernel on the named CPU at the time of adoption, it should not be used to describe a BSD emulation mode.

CPU name

The first field encodes information about the machine type, word size, and endianness. Each value of this field implies a particular base instruction set; however, there is no requirement that libraries installed to a directory referencing a given CPU name limit be limited to code that runs on machines implementing only this base instruction set. For example, a CPU value of x86 refers to the instruction set first implemented on the 80386 processor, but most current implementations will make use of instructions first made available in the P5 and P6 processors; few if any will run on true 80386 processors. In this way, enabling the use of optional instruction set extensions should not require a different multiarch tuple. Information about such extensions can instead be encoded using systems such as glibc's hwcap mechanism.

However, in a case where a member of a processor family lacks support for instructions that are considered "legacy" or otherwise implements support for only a subset of the instructions that are part of the base CPU definition, and there is any need to support code both with and without the use of these instructions (which will normally be the case), a new CPU identifier should be defined. Possible examples are: ARM vs. Thumb-2; PowerPC vs. PowerPC SPE; and Motorola 68k vs. Coldfire 68k.

An initial set of values for this field is proposed below.

Syscall ABI

The second field encodes information about the syscall ABI of the implementation. That includes information such as the order and meaning of syscalls, as well as their calling convention. In most cases, there will be a one-to-one mapping between syscall ABIs and implementing kernels and we recommend using the kernel name as the value of the field. In other cases, a kernel implementation for a given CPU family may have gone through multiple syscall ABIs over time, or even simultaneously support multiple ABIs that are selectable at runtime (called "personalities" in Linux). In those cases, the kernel name will be used as the identifier for the predominant or default syscall ABI in use at the time of adoption and variants of the kernel name may be used for other supported syscall ABIs as necessary.

Calling conventions and libc ABI

The final field encapsulates the user space ABI: the ABI of the standard C library (set of functions implemented, type definitions, base symbol versions supported) and the register calling convention for inter-library functions.

It is assumed that all relevant implementations will have a corresponding C library, as without a standard library there is little value in producing shared libraries for a target and therefore no need for these to be co-installable on the filesystem. Nevertheless, the possible inclusion of libc-less targets in this schema influenced the decision to encode endianness in the CPU name field, even though in some cases a processor family may be endianness-agnostic.

As with syscall names, the same name will generally be used for a given libc implementation across multiple hardware architectures where applicable. Mutually incompatible ABIs provided by a given C library within a single architecture may commonly be represented as variations on the base name of the C library.

In the case of the GNU C library, the name "glibc" is preferred over "gnu" to avoid confusion between GNU triplets and multiarch tuples.

Supporting interfaces

To avoid divergent embedded implementations of architecture->tuple mappings in every piece of software that wants to make use of these paths, a standard commandline tool that encapsulates this mapping should be provided, similar to the config.guess and config.sub tools that are a standard part of GNU autotools. We recommend a name of lsb_architecture for this tool.

The tool should accept no commandline arguments and instead interrogate the configured build environment, respecting standard environment variables such as CC, CPPFLAGS and CFLAGS. It may also respect distribution-specific environment settings, such as DEB_HOST_ARCH. On success, the string representation of the multiarch tuple will be returned on stdout and the command will exit 0. On a failure to determine a tuple for the target, the command will exit non-zero and an explanatory error message may be returned on stderr.

A preliminary reference implementation is provided here.

Endorsements

This proposal is submitted to the LSB WG for consideration by members of the Debian and Ubuntu communities: