Debian Cross Toolchain overview

Cross-toolchain building

The cross-toolchain has several components. The fundamental ones are:

But there are also necessary related parts:

And related packages such as build-essential/crossbuild-essential, dpkg-cross, and cross-pkg-config. The importance or otherwise of these is discussed later.

Binutils is straightforward as it goes not have any host-arch dependencies and is easily built for any supported target arch, to produce a binutils-<triplet> package.

The interesting part is gcc. <triplet>-gcc needs a <target>-libc-dev and <target>-linux-libc-headers to build. Those can be generated in various ways, and the gcc packaging supports two locations for them.

  1. in 'classic cross-compiler dirs': /usr/<triplet>/{lib,include}

  2. in multiarch locations: /usr/lib/<triplet>, /usr/include/<triplet>

The first case is the default, the second case is enabled by setting 'with_deps_on_target_arch_pkgs=yes'.

These two basic build arrangements do not have obvious names, so various things have been used in discussions.

1 has been called '?MultiarchBuilt', 'Multiarch-Multiarch', 'Pure Multiarch' and 'with_deps_on_target_arch_pkgs'

2 has been called 'standalone', 'bootstrap', 'self-contained', 'default', 'supported'.

Comparison of build methods

When the files are in multiarch locations they can be supplied by standard debian packages of the target architecture. The build is more like a normal cross-build, except that although here for BUILDxHOSTxTARGET we have build=host!=target

  1. libc-dev-<target>-cross, linux-libc-dev-<target>-cross

  2. libc-dev:<target>, linux-libc-dev:<target>

We refer to these build types as 'standalone', and 'multiarch', because in the first case the resulting cross gcc depends on -cross:all packages, and in the second case depends on foreign-arch :<target> packages.

There are also two ways of getting the -cross packages. They can be copied from the :<target> packages (by moving files around using dpkg-cross), or they can be cross-built from libc and linux (currently still using dpkg-cross).

Multiarch-Built

These build (for example) gcc-arm-linux-gnueabihf (and cpp,g++,gfortran) against the linux-libc-dev:armhf, libc6-dev:armhf, libstdc++-dev:armhf and libgcc1:armhf already in the archive. The build is quick and simple, but the resulting package has cross-arch dependencies (on the various :armhf packages), so you need to multiarch-enable the relevant foreign architecture (dpkg --add-architecture) to install them. You will need to enable multiarch for most cross-builds anyway, in order to install build-dependencies.

Standalone Built

These build (for example) linux-libc-dev-cross-armhf, libc-dev-cross-armhf, libstdc++-dev-cross-armhf, libgcc1-cross-armhf and gcc-arm-linux-gnueabihf (and cpp,g++,gfortran) from the kernel, glibc, and gcc sources, via the toolchain bootstrap process. Because the foreign-arch libraries are converted to build-arch packages, the toolchain does not need multiarch enabled to build, but the build is much longer and more complicated, and you end up with two copies of those libraries on your system.

This type of build is necessary when the target architecture is not in the debian archive as the libraries are not available to build against.

Multiarch vs Multilib

Multiarch and multilib are to some degree alternative ways of doing the same thing (providing a place for foreign libraries).

Targeting related architectures (such as i386/amd64, armel/armhf, mips/mipsel, powerpc/ppc64/ppc64el) can be done in two different ways. You can either build one cross-compiler for each target, and install whichever ones you need, and call them with triplet prefixes, or you can build one cross-compiler which has two or more multilibs installed, install just that one cross-compiler and use build options to control which code is output.

i.e on amd64 you can target i386 either by running:

or by running

These options are not consistent across different architecture sets, whilst use of <triplet>-gcc always works:

Using <triplet>-gcc everywhere gives consistency across the distro and is simple for upstreams and packagers. It also makes build-dependencies orthogonal and consistent: if you need to make arm-linux-gnueabihf (armhf) binaries as well as arm-linux-gnueabi (armel) in a build then you depend on both compilers or cross-compilers.

However, many upstreams do use multilib compile options, especially in x86, where the use of -m32 and -m64 is common. We need to support those (or fix them all in packaging). This is particularly relevant for building things other than debian packages, where we do not get to fix what upstream does.

Statistics on how big an issue this is in the archive would be helpful.

This issue is orthogonal to how the toolchains are built, at least in theory. However at the time of writing (end 2014) they are entangled because multilib builds do not work with the multiarch-built method (for all arches - ?YunQiang Su says it is working for mips, but it does not work for arm), so if you want multilibs you need to do a standalone build. This is currently under investigation.

Building multilibbed cross-toolchains is significantly more complicated than building plain multiarch ones, but the gcc packaging does already contain the machinery for doing this, both native and cross. However this also affects the cross-toolchain packaging. Without multilib the packaging is entirely orthogonal and the same rules file works for all architectures. (See cross-gcc in experimental which demonstrates this). With multilib the pairs/triplets have to be listed in several places. It may be possible to generalise this but it is not currently done.

Thus the current MAbuilt debian cross-toolchains for jessie and unstable are not multilibbed and only support the <triplet>-gcc usage (i.e. not -m32, -mabi=softfp) method. Install whichever targets you need, and use triplet- commands everywhere.

Debian/Ubuntu Cross Toolchain history

Debian cross-toolchains have existed in various forms since around 2000.

First (circa 2000) was the 'toolchain-source' package which was a copy of the gcc sources with the rules modified to build cross-compilers. This suffered from divergence from the normal gcc packages, with different versions, patches and bugs. The cross-support rules in this was merged into the main gcc package, and gcc output a gcc-source package so that cross-toolchains could be built using that.

For many years (since 2004) the emdebian project used this functionality to build cross-toolchain binaries for Debian. These builds were done by using the libc/linux-headers from the target arch, converted to libc-<target>-cross/linux-libc-dev-<target>-cross with dpkg-cross, then building the package against those. The 'buildcross' tool was developed to mechanise this process, and build for multiple host architectures.

The problem with this method was that it could not easily be made into a standard package that would build in the archive, because there was no way to express the dependency on a foreign-arch libc/linux-libc-dev, and also because downloading as part of a package build is not permitted. Thus the packages lived outside the archive (at emdebian) for a decade or so, and became well used.

Whilst multiarch was being developed around 2009/10 it became clear that it could solve this problem of specifying foreign dependencies for cross-toolchains, and explicit-arch dependencies were included in the spec partly for that reason.

Meanwhile linaro wanted cross-toolchains in Ubuntu before all this was ready so packages were created (by Marcin) which ran the whole toolchain bootstrap procedure, build-depping on linux, binutils, libc, and gcc sources, and building linux-libc-dev-<target>-cross, binutils-<triplet>, libc-<target>-cross, gcc-<triplet>, via gcc stage1, libc stage1, gcc stage2, libc stage2, gcc stage3. This was the only way to build a cross-toolchain inside a standard package at the time. Those toolchains went into Ubuntu 10.10 and at the emdebian sprint at ARM in early 2011 it was planned that they would be fixed up to build on Debian and uploaded there too, until multiarch-built cross-toolchains were available/practical.

A GSOC 2012 project was done (by Thibaut Girka) to make the necessary changes to gcc for multiarch builds, and merged in late 2012. So now it was possible to build a cross-compiler by just depending on the foreign-arch libraries needed.

The upload of the full-bootstrap packages never got done so Debian still had no in-archive cross-toolchains for wheezy, and the emdebian toolchains were not maintained any more as we expected a move to the new multiarch ones quickly. That took much longer than expected in the way of things.

Multiarch-built cross-toolchains were working in 2013, but still could not be uploaded until the infrastructure learned about foreign-arch dependencies. Sbuild, wanna-build and britney needed changes. Sbuild was fixed in time for Jessie - it now automatically enables a foreign architecture if a package build-deps on one so that the dependency can be installed during the build. Wanna-build also needed to be modified to pass the right options to dose when checking if something has all it dependencies available. Similarly Britney needed to be taught to consider foreign arches when migrating packages.

These changes fell afoul of the need to have them already in stable before they can be used to build testing/unstable, so it will be stretch before all this is working in the archive.

Packing arrangements

Info on how the -cross packages are generated

Dealing with architectures

The obvious way to build cross-compilers from the gcc source is to build them as part of the gcc package build. However that has two problems:

  1. The gcc build is already very long and produces a lot of packages
  2. Any cross-build failure will fail the whole package build and the gcc maintainer does not want this problem

There are two possible approaches to this:

  1. Build cross-toolchains from separate source packages. The idea is that these are just as thin a veneer as possible over the gcc packages, with the correct control file for the necessary dependencies.
  2. Have wanna-build understand that different target-arch builds should launched on the gcc-4.9 upload. This is a highly experimental idea, which may not work, but worth investigating.

Because the build-dependencies are arch-dependent there have to be corresponding source packages for each target arch (with the correct control files). gcc-4.9 can generate these dependencies, but they have to be recorded in a static control file for the packages to be buildable in the archive.

So we have a set of packages like:

The cross-gcc-4.9-arch packages are actually identical except for <arch> so are generated from one template. This is important for maintenance, and having one place to file bugs in the archive, not 7. In cross-gcc_6 (experimental) this is taken further to produce a binary package cross-gcc-dev which supplies the core rules file used by all the per-target-arch source packages.

This is harder to achieve when multilib support is added because now things are no longer orthogonal and consistently named. Some arches are bi-arch, some tri-arch. A way to manage this is needed if multilib is to be supported.

Pros and cons

MA-Built

Standalone-built

Older text

Handling cross compiler versions/defaults

Marcin notes

Status

Currently we have two ways of doing cross toolchain in Debian/Ubuntu world:

EmDebian way

Should work in any Debian derived distribution due to simpleness of it. The problem is that it is manual process which can be automated but is still impossible to do on buildd - and as such it can not be added into Debian repository. EmDebian developers solved that by having daemon which rebuilds toolchain packages after their updates in Debian archive.

Another problem is manual fetching of eglibc and linux packages for target arch. But this part can be solved by using multiarch capable APT (apt-get -o APT::architecture=armel download libc6-dev).

Ubuntu way

Ubuntu way handles building of cross toolchain in other way - by fullbootstrap of it. Due to fact that final gcc (gcc stage3 in bootstrap terminology) requires target headers to be available in /usr/$ARCH/ directories I split toolchain into two packages:

So far packages for gcc 4.4 and 4.5 are created. 4.6 version will follow soon - it will be basically copy of 4.5 one.

But how to get Ubuntu source packages working under Debian?

Experimental requirements

First we need binutils 2.21 and gcc-4.5 from experimental - they contain all my changes which I did for Ubuntu 10.10 'maverick' and all later ones. Many things got cleaned, code duplication which was present for cross targets got eliminated in favour of reusing native packaging as much as possible. Effect is that we have -dbg packages for all libraries and soon also -dbgsym ones. Some work may still need to be done to make sure that cross toolchain for all of Debian architectures can be built and used.

In-progress packaging

Next requirements are armel-cross-toolchain-base and gcc-4.5-armel-cross from my git repository at git.linaro.org server. Latter one is same as Ubuntu one but has build dependencies lowered (Ubuntu has eglibc 2.12, Debian has 2.11 for example). Worse situation is with armel-cross-toolchain-base one...

How it works

To bootstrap cross toolchain I reuse sources which are available in *-source binary packages for binutils/eglibc/gcc-4.5/linux-2.6 components. For binutils and gcc-4.[456] there is no problem as changes are present.

Eglibc/Linux problems

Worse situation is with eglibc and linux-2.6 -source packages as they do not provide Debian packaging inside. I opened bug against linux-2.6 but so far it got refused with answer like "wait for multiarch it will solve your problem". I assume similar answer will be for eglibc but I will report wishlist bug anyway. So far as a work around I included whole eglibc packaging (4MB) inside of armel-cross-toolchain-base and same with linux-2.6. Effect is ugly, non-maintainable but at least I have something to test.

Build problems

Current Debian builds of final eglibc fails on building "nscd/others". It is linking problem as ld is not able to find ld-linux.so for some symbols. It links fine if I call failing line with library added.

If build fails on "build-linux" stage then it is a reason of not whole linux-2.6 packaging copy but it was solved by making it complete.

Bootstrap order and dependencies

1. binutils-cross sysrooted 2. gcc1-cross (requires 1) 3. linux-headers-cross 4. eglibc1-cross (requires 2) 5. gcc2-cross (requires 4, gives libgcc packages) 6. eglibc-final-cross (requires 5, gives all eglibc packages) 7. binutils-cross without sysroot (gives binutils-cross packages)

Why two builds of binutils? gcc1 and gcc2 are build with sysroot enabled as we do not have access to /usr/ARCH directories during build. So we need binutils which will also use sysroot.

Patches used

Multiarch future (view from 2011)

There is ongoing work on having multiarch dpkg working for both Debian and Ubuntu distributions. When it will get to final state both ways of building cross compiler will have to be changed because there will be no need to manually fetch target arch packages because we could just build-depend on them. But thats future - first stage of deploying multiarch will not give us this because whole build infrastructure of both distributions needs to be changed first.

But what we will have to do when we will have final multiarch support? I think that there will be will be able to abandon armel-cross-toolchain-base package in favour of binutils-cross one as there will be no need to cross build eglibc or linux headers (we will just build-depend on target packages).

On Ubuntu side I will still maintain (then deprecated) packages due to LTS support which I promised to our users. But this part will not affect Ubuntu 'current' or Debian 'wheezy'.

Results

Common development on cross toolchains happens in Cross Toolchain Team at Alioth under collab-maint