Debian Cross Toolchain overview

Cross-toolchain building

The cross-toolchain has several components. The fundamental ones are:

But there are also necessary related parts:

And related packages such as build-essential/crossbuild-essential, dpkg-cross, and cross-pkg-config. The importance or otherwise of these is discussed later.

Binutils is straightforward as it does not have any target-arch dependencies and is easily built for any supported target arch, to produce a binutils-<triplet> package.

The interesting part is gcc. <triplet>-gcc needs a <target> libc-dev and <target> linux-libc-headers to build. Those can be generated in various ways, and the gcc packaging (until ver 4.9.1-18) supports two locations for them.

  1. in 'classic cross-compiler dirs': /usr/<triplet>/{lib,include}

  2. in multiarch locations: /usr/lib/<triplet>, /usr/include/<triplet>

The first case is the default, the second case is enabled by setting 'with_deps_on_target_arch_pkgs=yes'.

These two basic build arrangements do not have obvious names, so various things have been used in discussions.

1 has been called '?MultiarchBuilt'(MAbuilt), 'Multiarch-Multiarch', 'Pure Multiarch' and 'with_deps_on_target_arch_pkgs' (w_d_o_t_a_p)

2 has been called 'standalone', 'bootstrap', 'self-contained', 'default', 'supported'.

Cross Binutils

cross-binutils package builds a set of binutils-<triplet> packages. This package is in jessie.

In Ubuntu the binutils-<triplet> packages are produced by the cross-toolchain-base-<arch> package.

Comparison of build methods

When the files are in multiarch locations they can be supplied by standard debian packages of the target architecture. The build is more like a normal cross-build (BUILD!=HOST), except that here we have BUILD=HOST!=TARGET

  1. libc-dev-<target>-cross, linux-libc-dev-<target>-cross

  2. libc-dev:<target>, linux-libc-dev:<target>

We refer to these build types as 'standalone', and 'multiarch-built', because in the first case the resulting cross gcc depends on -cross:all packages, and in the second case depends on foreign-arch :<target> packages.

There are also two ways of getting the -cross packages. They can be copied from the :<target> packages (by moving files around using dpkg-cross), or they can be cross-built from libc and linux sources (currently also using dpkg-cross). Lets call these 'buildcross' and 'toolchain-base' builds.

Multiarch-Built (wdotap)

These build (for example) gcc-arm-linux-gnueabihf (and cpp,g++,etc) against the linux-libc-dev:armhf, libc6-dev:armhf, libstdc++-dev:armhf and libgcc1:armhf already in the archive. The build is quick and simple, but the resulting package has cross-arch dependencies (on the various :armhf packages), so you need to multiarch-enable the relevant foreign architecture (dpkg --add-architecture) to install them. You will need to enable multiarch for most cross-builds anyway, in order to install build-dependencies (unless building something with no library build-deps, like an upstream kernel, or bootloader).

This is often referred to as 'wdotap' after the variable used to control it in gcc 'with_deps_on_target_arch_packages'. (This functionality was removed from gcc in 4.9.1-19, and the code has been maintained as patches in cross-gcc-dev).

Standalone Build

These build (for example) linux-libc-dev-cross-armhf, libc-dev-cross-armhf, libstdc++-dev-cross-armhf, libgcc1-cross-armhf and gcc-arm-linux-gnueabihf (and cpp,g++,gfortran) from the kernel, glibc, and gcc sources, via the toolchain bootstrap process. Because the foreign-arch libraries are converted to arch-all packages, the toolchain does not need multiarch enabled to build or install.

The libraries are installed to the 'classic cross-compiler paths' (i.e /usr/<triplet>/lib) in order to do this build.

Building this way you do end up with two copies of those libraries (libc, libgcc1, linux-libc-dev) on your system (when you later install mulitarch build-deps for cross-building): One in /usr/<triplet>/lib (classic cross-location) and one in /usr/lib/<triplet> (multiarch cross location).

Getting dependencies

If doing standalone builds the library dependency -cross packages have to come from somewhere. There are essentially two ways to get them: convert from existing foreign-arch libraries, or (cross) build everything from scratch.

Buildcross build

Use the existing libc:<target> and linux-libc-dev:<target> packages, download them and convert with dpkg-cross to libc-<target>-cross:all and linux-libc-dev-<target>-cross:all packages, using dpkg-cross. Obviously the target arch packages need to have been natively built already.

This is how the emdebian packages were produced for many years.

Toolchain base build

This runs through the cross-toolchain bootstrap procedure in order to get a target-arch libc and linux-libc-dev to build against. 1. build linux-libc-dev from linux-source. Package it as linux-libc-dev-<target>-cross 2. build stage1 <triplet>-gcc 3. build stage1 glibc (using stage1 <triplet>-gcc) 4. build stage2 <triplet-gcc> aginst stage1 glibc 5. build stage2 glibc (using stage2 <triplet>-gcc) 6. use dpkg-cross to convert stage2 glibc into libc-<target>-cross, (and libc-dev-<target>-cross)

A variant of this procedure has been used to build ubuntu cross-toolchains since precise. That includes a cross-binutils build as well as the linux, libc and gcc bulds.

This method has the advantage of no real build-deps as it does the whole bootstrap, so is useful when targetting architectures not in the archive.

Bootstrapping Builds

This is necessary when the target architecture is not in the debian archive as the libraries are not available to build against.

Multiarch vs Multilib

Multiarch and multilib are to some degree alternative ways of doing the same thing (providing a place for foreign libraries).

Targeting related architectures (such as i386/amd64, armel/armhf, mips/mipsel, powerpc/ppc64/ppc64el) can be done in two different ways. You can either build one cross-compiler for each target, and install whichever ones you need, and call them with triplet prefixes, or you can build one cross-compiler which has two or more multilibs installed, install just that one cross-compiler and use build options to control which code is output.

These options are not consistent across different architecture sets, whilst use of <triplet>-gcc always works:

x86 -m32 -m64

arm -mabi={hardfp,softfp}

mips -mabi={32,n32,64}

Other sets exist (powerpc,ppc64,ppc64el).

Using <triplet>-gcc command works everywhere, for any target arch, either native or cross (these commands have been provided by the native compilers since wheezy); and it doesn't matter whether the compiler or cross-compiler was built with multilibs or not. So this is nice and consistent everywhere. Upstreams should be strongly encouraged to do this, and at least in debian (where we know these triplet names are available in native gcc) packages builds should be using triplet-gcc even for native building as then no special config is needed for cross-building. This is a very sensible meme to also push upstream.

Using <triplet>-gcc everywhere gives consistency across the distro and is simple for upstreams and packagers. It also makes build-dependencies orthogonal and consistent: if you need to make arm-linux-gnueabihf (armhf) binaries as well as arm-linux-gnueabi (armel) in a build then you depend on both compilers or cross-compilers. Otherwise you have to know which multilib arches (pairs/triplets) are implicit in a particular compiler/cross-compiler.

However, many upstreams do use multilib compile options, especially in x86, where the use of -m32 and -m64 is common. We need to support those (or fix them all in packaging). This is particularly relevant for building things other than debian packages, where we do not get to fix what upstream does.

Statistics on how big an issue this is in the archive would be helpful.

This issue is orthogonal to how the toolchains are built, at least in principle. Since Jan 2015 we can build both multilib and non-multilib compilers using the multiarch-build method, and multilibbed compilers using the standalone method. non-multilib compilers could fairly easy be built standalone too, but it is not currently supported.

Building multilibbed cross-toolchains is significantly more complicated than building plain multiarch ones, but the gcc packaging does already contain the machinery for doing this, both native and cross. However this also affects the cross-toolchain packaging. Without multilib the packaging is entirely target-arch-independent and the same rules file works for all architectures. (See cross-gcc and cross-gcc-dev in jessie which demonstrates this by having an identical rules files for all target architectures). With multilib the pairs/triplets have to be listed and different arches have different build-depenencies. Multilib builds are supported in cross-gcc from version 25 (2015-05-04).

The current (May 2015) MAbuilt debian cross-toolchains for jessie are not multilibbed and only support the <triplet>-gcc usage (i.e. not -m32, -mabi=softfp, -mabi=n32) method. Install whichever targets you need, and use triplet- commands everywhere.

The unstable (May 2015) MAbuilt debian cross-toolchains are multilibed for mips, mipsel, i386, amd64, and powerpc.

Packing arrangements

Info on how the -cross packages are generated should go here.

Dealing with target architectures

The obvious way to build cross-compilers from the gcc source is to build them as part of the gcc package build. However that has two problems:

  1. The gcc build is already very long and produces a lot of packages
  2. Any cross-build failure will fail the whole package build and the gcc maintainer does not want this problem (getting everything native-building on the full set of debian arches is already more than enough complexity)

There are two possible approaches to this:

  1. Build cross-toolchains from separate source packages. The idea is that these are just as thin a veneer as possible over the gcc packages, with the correct control file for the necessary dependencies.
  2. Have wanna-build understand that different target-arch builds should launched on the gcc-4.9 upload. This is a highly experimental idea, which may not work, but worth investigating.

Because the build-dependencies are arch-dependent there have to be corresponding source packages for each target arch (with the correct control files). gcc-4.9 can generate these dependencies, but they have to be recorded in a static control file for the packages to be buildable in the archive.

So we have a set of packages like:

The cross-gcc-4.9-<arch> packages are actually identical except for <arch> so are generated from one template. This is important for maintenance, and not having to maintain separate source pacakges. In cross-gcc_6 (experimental) this is taken further to produce a binary package cross-gcc-dev which supplies the core rules file used by all the per-target-arch source packages. This enables both having the preferred form of modificating in the archive, and having one place to file bugs in the archive, not 7.

This is harder to achieve when multilib support is added because now things are no longer orthogonal and consistently named. Some arches are bi-arch, some tri-arch. A way to manage this is needed if multilib is to be supported.

In jessie and unstable there is no source cross-gcc (binary cross-gcc-dev) package in the archive: there is just a cross-gcc git repo in the cross-toolchain project, which is used to generate the cross-gcc-4.9-<arch> sources that are actually uploaded. This has been improved-upon in experimental, as described above.

Pros and cons

MA-Built

Standalone-build

Buildcross Build

Toolchain-base build

Handling cross compiler versions/defaults

Debian/Ubuntu Cross Toolchain history

Debian cross-toolchains have existed in various forms since around 2000.

First (circa 2000) was the 'toolchain-source' package which was a copy of the gcc sources with the rules modified to build cross-compilers. This suffered from divergence from the normal gcc packages, with different versions, patches and bugs. The cross-support rules in this was merged into the main gcc package, and gcc output a gcc-source package so that cross-toolchains could be built using that.

For many years (since 2004) the emdebian project used this functionality to build cross-toolchain binaries for Debian. These builds were done by using the libc/linux-headers from the target arch, converted to libc-<target>-cross/linux-libc-dev-<target>-cross with dpkg-cross, then building the package against those. The 'buildcross' tool was developed to mechanise this process, and build for multiple host architectures.

The problem with this method was that it could not easily be made into a standard package that would build in the archive, because there was no way to express the dependency on a foreign-arch libc/linux-libc-dev, and also because downloading as part of a package build is not permitted. Thus the packages lived outside the archive (at emdebian) for a decade or so, and became well used.

Whilst multiarch was being developed around 2009/10 it became clear that it could solve this problem of specifying foreign dependencies for cross-toolchains, and explicit-arch dependencies were included in the spec partly for that reason.

Meanwhile linaro wanted cross-toolchains in Ubuntu before all this was ready so packages were created (by Marcin) which ran the whole toolchain bootstrap procedure, build-depping on linux, binutils, libc, and gcc sources, and building linux-libc-dev-<target>-cross, binutils-<triplet>, libc-<target>-cross, gcc-<triplet>, via gcc stage1, libc stage1, gcc stage2, libc stage2, gcc stage3. This was the only way to build a cross-toolchain inside a standard package at the time. Those toolchains went into Ubuntu 10.10 and at the emdebian sprint at ARM in early 2011 it was planned that they would be fixed up to build on Debian and uploaded there too, until multiarch-built cross-toolchains were available/practical.

A GSOC 2012 project was done (by Thibaut Girka) to make the necessary changes to gcc for multiarch builds, and merged in late 2012. So now it was possible to build a cross-compiler by just depending on the foreign-arch libraries needed.

The upload of the full-bootstrap packages never got done so Debian still had no in-archive cross-toolchains for wheezy, and the emdebian toolchains were not maintained any more as we expected a move to the new multiarch ones quickly. That took much longer than expected in the way of things.

Multiarch-built cross-toolchains were working in 2013, but still could not be uploaded until the infrastructure learned about foreign-arch dependencies. Sbuild, wanna-build and britney needed changes. Sbuild was fixed in time for Jessie - it now automatically enables a foreign architecture if a package build-deps on one so that the dependency can be installed during the build. Wanna-build also needed to be modified to pass the right options to dose when checking if something has all it dependencies available. Similarly Britney needed to be taught to consider foreign arches when migrating packages.

These changes fell afoul of the need to have them already in stable before they can be used to build testing/unstable, so it will be post-jessie before all this is working in the archive.


Marcin notes

From here on down is 2010-vintage material, but still possibly of some interest.

Status

Currently we have two ways of doing cross toolchain in Debian/Ubuntu world:

EmDebian way

Should work in any Debian derived distribution due to simpleness of it. The problem is that it is manual process which can be automated but is still impossible to do on buildd - and as such it can not be added into Debian repository. EmDebian developers solved that by having daemon which rebuilds toolchain packages after their updates in Debian archive.

Another problem is manual fetching of eglibc and linux packages for target arch. But this part can be solved by using multiarch capable APT (apt-get -o APT::architecture=armel download libc6-dev).

Ubuntu way

Ubuntu way handles building of cross toolchain in other way - by fullbootstrap of it. Due to fact that final gcc (gcc stage3 in bootstrap terminology) requires target headers to be available in /usr/$ARCH/ directories I split toolchain into two packages:

So far packages for gcc 4.4 and 4.5 are created. 4.6 version will follow soon - it will be basically copy of 4.5 one.

But how to get Ubuntu source packages working under Debian?

Experimental requirements

First we need binutils 2.21 and gcc-4.5 from experimental - they contain all my changes which I did for Ubuntu 10.10 'maverick' and all later ones. Many things got cleaned, code duplication which was present for cross targets got eliminated in favour of reusing native packaging as much as possible. Effect is that we have -dbg packages for all libraries and soon also -dbgsym ones. Some work may still need to be done to make sure that cross toolchain for all of Debian architectures can be built and used.

In-progress packaging

Next requirements are armel-cross-toolchain-base and gcc-4.5-armel-cross from my git repository at git.linaro.org server. Latter one is same as Ubuntu one but has build dependencies lowered (Ubuntu has eglibc 2.12, Debian has 2.11 for example). Worse situation is with armel-cross-toolchain-base one...

How it works

To bootstrap cross toolchain I reuse sources which are available in *-source binary packages for binutils/eglibc/gcc-4.5/linux-2.6 components. For binutils and gcc-4.[456] there is no problem as changes are present.

Eglibc/Linux problems

Worse situation is with eglibc and linux-2.6 -source packages as they do not provide Debian packaging inside. I opened bug against linux-2.6 but so far it got refused with answer like "wait for multiarch it will solve your problem". I assume similar answer will be for eglibc but I will report wishlist bug anyway. So far as a work around I included whole eglibc packaging (4MB) inside of armel-cross-toolchain-base and same with linux-2.6. Effect is ugly, non-maintainable but at least I have something to test.

Build problems

Current Debian builds of final eglibc fails on building "nscd/others". It is linking problem as ld is not able to find ld-linux.so for some symbols. It links fine if I call failing line with library added.

If build fails on "build-linux" stage then it is a reason of not whole linux-2.6 packaging copy but it was solved by making it complete.

Bootstrap order and dependencies

1. binutils-cross sysrooted 2. gcc1-cross (requires 1) 3. linux-headers-cross 4. eglibc1-cross (requires 2) 5. gcc2-cross (requires 4, gives libgcc packages) 6. eglibc-final-cross (requires 5, gives all eglibc packages) 7. binutils-cross without sysroot (gives binutils-cross packages)

Why two builds of binutils? gcc1 and gcc2 are build with sysroot enabled as we do not have access to /usr/ARCH directories during build. So we need binutils which will also use sysroot.

Patches used

Multiarch future (view from 2011)

There is ongoing work on having multiarch dpkg working for both Debian and Ubuntu distributions. When it will get to final state both ways of building cross compiler will have to be changed because there will be no need to manually fetch target arch packages because we could just build-depend on them. But thats future - first stage of deploying multiarch will not give us this because whole build infrastructure of both distributions needs to be changed first.

But what we will have to do when we will have final multiarch support? I think that there will be will be able to abandon armel-cross-toolchain-base package in favour of binutils-cross one as there will be no need to cross build eglibc or linux headers (we will just build-depend on target packages).

On Ubuntu side I will still maintain (then deprecated) packages due to LTS support which I promised to our users. But this part will not affect Ubuntu 'current' or Debian 'wheezy'.

Results

Common development on cross toolchains happens in Cross Toolchain Team at Alioth under collab-maint