Architecture variants for Debian / Ubuntu
Author: Michael Hudson-Doyle <michael.hudson@canonical.com> / <mwhudson@debian.org>
Contents
Abstract
We seek to add the concept of an "architecture variant" for Debian-based systems that will allow packages to be built targeting newer processors without bumping the baseline.
Rationale
Processor and ISA design does not stand still. It would be nice for our users to be able to use new instructions to have their applications run faster, use less energy etc.
Prior art
This is not a new idea, at all.
I believe there were some discussions around 2004/2005 about adding an i686 port in some fashion similar to what is proposed here but I can't find it now.
It was discussed at DebConf11: https://wiki.debian.org/Multiarch/Debconf11MultiarchRelatedMinutes#Partial_arches.3F
It seems to have been an old idea in 2014 (see "Partial archives for different ISAs" in https://wiki.debian.org/Sprints/2014/BootstrapSprint/Results).
https://github.com/SIMDebian was a thing for a while.
There is https://wiki.debian.org/InstructionSelection which is more about how to use new instructions in particular places rather than a way to use the new instructions widely (which may be a better idea overall, who knows).
None of these options seems to have gotten off the ground, as far as I can see.
Options
There are a few ways to do this:
- Bump baseline
Pros:
- easy
Cons:
- abandons users
Fat binary packages exploiting glibc hwcaps (i.e. have a library ship /usr/lib/$triplet/libfoo.so and /usr/lib/$triplet/$variant/libfoo.so)
Pros:
- pretty easy
- supports all users
- allows choice of baseline per package
Cons:
- per-package work (probably)
- binary packages get larger, impacting both users and mirrors
- more refined uses, e.g. container images/layers, also get impacted by larger size
- only works for shared libraries located by ld.so (so not for executables, python extensions, ...)
- not all architectures implement hwcaps
- Use ifuncs a lot more
Pros:
- supports all users
- allows very fine-grained selection of implementations
- works for almost all binaries (well, assuming PIE executables)
Cons:
- huge amount of very technical work
- binary packages get larger
Add a new port for each new ISA (e.g. arm64v8.5)
Pros:
- we know how to add new ports
- no increase in binary footprint for users
- works for all binaries
Cons:
- it's the wrong abstraction, Debian architectures are really ABIs and the ISA variants in play are not new ABIs.
- sub-point here: if you have an arm64 install and enable arm64v9 as a foreign architecture, which of libc6:arm64 and libc6:arm64v9 installs ld.so?
- harder to make sure users get the right images
- large impact on disk usage for mirrors
some per-package work (e.g. amd64v3 would need to be added to be added to debian/control for all packages that list amd64), anything that dispatches off DEB_HOST_ARCH would need a check)
- harder to use 3rd party repos
- Add architecture variants and have apt install the best variant for each package
Pros:
- continues to support all users
- very little per-package work at all AFAICS
- no increase in binary footprint for users
- helps all binaries (executables/libraries/plugins)
- allows us to target a subset of packages
- makes crossgrades easier
- probably makes for a cleaner experience for changing the set of supported variants
Cons:
- requires introducing a new concept to Debian packaging
- requires work in dpkg/apt/builders/publisher
- harder to make sure users get the right images
- large impact on disk usage for mirrors
- allows us to target a subset of packages and thus get bogged down in deciding which packages are worth it
- some packages will require work to detect build for variant
Although it has the most unknown work, option 5 seems like the best option currently and that's what this document is about.
Requirements
For a lot of software (e.g. britney) we can redefine what they currently call "architecture" to "ABI + delimiter + variant" (e.g. "arm64:v9" or "arm64_v9") and then continue as before. For other areas:
- building
- the build scheduler will need to dispatch builds for each variant
- the scheduler could dispatch different sets of builds for different packages
- the variant being targeted will need to be communicated to the build environment
- I think it will be OK to assume that the variant being targeted is supported by the build system
- the default compiler flags will need to be set to match the variant being targeted
- the .deb files made by the build need to record which variant they are for
- there needs to be a convention for the name of the .deb file so that packages for multiple variants can be published into the same pool
- the build scheduler will need to dispatch builds for each variant
- publishing
- the .debs for each variant need to be published to the pool
- the apt lists will need to reference the .debs for each variant
- installing
- when installing a package, apt needs to select the package most appropriate for the current system
Which packages to build
It would be possible to selectively build packages for different variants.
It would certainly be simpler to not do this, to build every package for every supported variant, and some of the implementation thoughts below reflect this.
We should not pick a design that cannot support building a subset of packages for a variant.
Implementation
dpkg
The variant should be recorded in the control file:
Architecture: amd64 ArchitectureVariant: v3
or
Architecture: arm64:v8.5
(probably the former)
Otherwise, I don't think /usr/bin/dpkg really needs to care. arm64:v8.5 packages are just arm64 packages wrt dependency satisfaction and all that.
Perhaps it should refuse to install a package that it knows won't run (it's tempting to keep variants completely opaque to packaging though...).
dpkg-deb will need a change to include the variant in the default archive name for --build.
dpkg-dev
I assume the variant being targeted will be indicated by an environment variable.
Regardless of how the variant ends up in the control file, it probably makes sense for this to be a new environment variable – one of the benefits of the variant approach is not having to modify packages that reference DEB_HOST_ARCH etc. So adding a DEB_HOST_ARCH_VARIANT variable (defaulting to either empty or not set) sounds sensible here. For symmetry there will be BUILD and TARGET variants too (although I think we should document that it is an error to try to build targeting a variant on a system that does not support it).
This will require changes to dpkg-architecture and anything that passes flags through to dpkg-architecture (I think this is just dpkg-buildpackage).
Creating the package will require changes to dpkg-gencontrol to record the variant in DEBIAN/control and (as above) dpkg-deb to include the variant in the filename. I guess dpkg-genchanges should put the variant into the Architecture field of the .changes file as well (probably with a delimiter rather than a separate field).
dpkg-buildflags will inspect DEB_HOST_ARCH_VARIANT and configure $CFLAGS etc as appropriate to actually implement the behaviours we are interested in.
I'm not sure how we configure a given build or chroot to set DEB_*_VARIANT by default. Something in /etc/dpkg? Or have sbuild etc pass something through? (dpkg-architecture and dpkg-buildpackage will grow --host-arch-variant / --target-arch-variant flags regardless but I'm not sure that's the right way to set a default, it seems like it would be better to have a buildd chroot for each variant).
There will be work to support other toolchains too (e.g. Go has GOAMD64 now) but that's a much lower priority than the default C/C++ compiler.
apt
Basically my idea here is to tweak which Packages files apt downloads, but otherwise make no changes.
While apt could grow code to introspect the running system to work out which variants it supports, I think it would be easier and more explicit to have the (ordered!) list of variants to look for be a configuration item.
For a given repo, apt will download the Release file as usual and then select the best Packages list from the variants found in there.
For example, assume:
- an arm64 system configured to search for v8.5 and v9 variants
a sources.list line like “http://archive.example.com testing main”
- archive.example.com contains builds for baseline and v8.5
Then apt should download something like http://archive.example.com/dists/testing/main/binary-arm64_v8.5/Packages.xz
It would be possible to have apt download Packages file for each variant published by the configured repository and have apt select the best binary package at install time. This would make it more straightforward to have only a subset of packages built for each variant but as my current thinking is to build all packages for each variant, this would only result in apt downloading more data than it needs (building subsets for each variant can also be handled on the publication side by e.g. including baseline or v8.5 debs in the v9 list when there is no v9 build for a given package).
sbuild
This might want to grow some flags to cause the DEB_*_ARCH_VARIANT variables to be set for a build, probably --arch-variant and so on. Or we could, as suggested above, define that --arch takes "arch + delimiter + variant".
dak
I don't really know anything about dak and am not actually proposing building Debian for a new variant at this point. But:
- the part that starts builds will need to be able to be configured as to which variants to build for each architecture, and (maybe) change how it invokes builds when building for a variant. This doesn't sound very difficult.
- The publication side will also need to generate lists for the variant builds, but that's not very different from adding a new architecture.
- If only a subset of packages is built for a variant then some merging of lists will have to be done at publication time and that sounds more complicated (which is one of the reasons for not wanting to do that).
image building and so on
Mostly this is similar to a new architecture. The bootstrap tool (debootstrap or mmdebstrap or whatever) will need some changes to support downloading packages for a variant and configuring apt in the chroot to reference it. Other parts of image building should not need to change.
Mailing list discussion(s)
There is a thread on debian-dpkg starting here and continuing here.