This wiki page is intended to describe the issues of, and mechanisms for, being able to bootstrap Debian from sources.
There is a real need to bootstrap Debian from sources when doing new ports or flavours. Every new architecture or ABI flavour needs to do this at least once, and making it easier than the current 'really very hard' would be great. It is also very useful for cross-compiling to new or non-self-hosted architectures, and for a genuinely new arch at least part of the system (toolchains+build-essential) has to be cross-built until there is enough to become self-hosting.
Recent new bootstraps have been done for sh4, armhf, uclibc and avr32. More are coming down the line. The subarch flavoured rebuilds are particularly useful on ARM and MIPS architectures.
An important principle is that the packaging changes necessary for this to work are reasonably clear and transparent. A Debian packager should not have to understand this stuff (staged builds and cross-building) in loving detail to avoid breaking things whilst making maintenance changes. We will use this principle when deciding between different technically-satisfactory ways of achieving things.
Two basic approaches to this problem have been suggested.
1) Using a tool/distro like Yocto or openembedded or scratchbox to generate a working rootfs from upstream sources, adding fake base packages and then native-building real debian packages from there. This needs a huge amount of manual dependency-loop breaking, although some holes can be filled by OE packages and fake deps. In practice something like this is what tends to happen currently.
2) Make the debian packages themselves have support for minimal builds enough to break build loop dependices and to cross-build so that an initial native-building system can be produced.
The main advantage of the latter case is that the necessary metadata and build rules get into the packages themselves in an ordered fashion and thus will tend not to get bitrot. It also greatly simplifies the build-tools because they have all the data needed in the package files, and the process is repeatable.
The disadvantage is that it is more work up front and needs build-system and policy changes.
Bootstrapping is closely tied to support for cross-building Debian packages because at least part of the process must be done cross. All of 'essential' at least needs to be cross-buildable. You cannot magic a system out of thin air. To move from cross to native building you need build-essential to be cross-buildable too.
Debian/Ubuntu cross-building is documented here: https://wiki.linaro.org/CrossBuilding
Patched sources to make Ubuntu Maverick base packages cross-buildable are here: https://launchpad.net/~peter-pearse/+archive/cross-source
For the process to be reliable cross-dependency metadata needs to be in packages, as specified here: https://wiki.ubuntu.com/MultiarchCross That can't happen until build tools everywhere can parse the decorated build-dependency data. So patches are being maintained outside the main archives as a proof-of-technology.
Bootstrapping could be managed by a tool like xdeb or sbuild, keeping track of staged builds and rebuilding things as needed so that staged builds don't hang around any longer than necessary. However any such tool could get out of sync with the current status, unless it is always determinable from the current package-set state. This spec attempts to define things such that it is always intrinsically stateful. Please speak up if you see ways that this isn't going to work.
The toolchain has a complex bootstrapping process involving binutils, gcc, libc and kernel-headers. It has been fixed up (in Ubuntu/Linaro) to bootstrap itself. This has currently only been demonstrated on armel. Once tested/extended to other architectures it can be uploaded in Debian. This work is ongoing, by Marcin Juszkiewicz and Hector Oron.
The toolchain has also had 'flavoured builds' added so that it is easy to rebuild the tolchain locally for a different default CPU/ISA/optimisation unit. (e.g with/without VFP or for v5/v7 instruction set on ARM).
Circular dependencies/staged builds
The main issue is circular build-dependencies. These fall into three main areas:
- Most languages depend on themselves to build (gcc, openjdk, mono, haskell, perl, python).
Libraries sometimes circularly depend:
kerberos -> ldap -> kerberos
qt -> poppler -> cups -> qt
- Documentation packages. Many packages need documentation tools (sgmltools, jade, tex, doxygen) which cannot be built until many other packages are built. This is a much worse problem for native-builds than cross-builds.
The generic way to deal with all of these is 'staged builds', where a version of the package is built with lesser functionality and thus a smaller dependency tree. This allows the depending package to then be built, then for the 'staged' package to be built normally.
This could be controlled by some kind of tool that kept track of which packages have currently been built as 'staged' packages and thus need rebuilding, but if we can correctly encode things in dependencies then this process can be made automatic and intrinsic. Exactly how this needs to be done is the subject of ongoing study.
A partial spec has been proposed here: https://wiki.ubuntu.com/Specs/M/ARMAutomatedBootstrap This document fills out that spec and proposes some further ideas and changes.
'Staged' builds are invoked by using DEB_STAGE variable to specify a staged build to dpkg-buildpackage. When it is not set, zero or null then a normal build occurs. Some packages may need more than one staged build. We do not know what the maximum number of stages needed is: it may be two, but to assume so would be foolish so counting up from negtive numbers makes it clear what is going on. So we count up from DEB_STAGE=-n, DEB_STAGE=-1 to 'normal' (or DEB_STAGE=0). (This is a change from ARMAutomatedBootstrap which simply has stages counting upwards 1,2,N). This isn't a big deal, but does help make clear to the average packager what is going on.
Any 'staged' package must be identified as such in the metadata so it is not accidentally uploaded as a 'real' package. Is the 'UNRELEASED' codename indicator sufficient or do we need something more explicit: e.g. X-Staged-Build:N header?
It must be possible for the build-tools to identify what build-stages are available. ARMAutomatedBootstrap proposes Build-Depends-StageN headers, one for each stage. The existence of that defines such a stage as being available.
Let's consider kerberos as a typical example of a library package involved in a circular dependency. krb5 needs libldap2-dev to build (from openldap). openldap need libkrb5-dev (from krb5) to build. To fix this we add a staged build to krb5 to miss out the generation of the krb5-ldap package. This is easy to do with a debhelper-based package by simply setting DH_OPTIONS="--no-package=krb5-ldap", and running configure with --without-ldap (when DEB_STAGE is "-1").
Dealing with changed build dependencies
The obvious way to define Build-Depends-StageN is simply to list all the build-dependencies again except changing or missing out some as required. The disadvantage of this is that it will tend to get bit-rot as it has to be maintained along with the normal build-depends. It would be very nice to just list a 'diff' from the normal build-dependency list - i.e. 'except package-foo' or 'package-minimal instead of package'. I'm not sure this is practical, but if anyone can work out how to do it...
So for krb5 we'd add: either
Build-Depends-StageN: except libldap2-dev
Build-Depends-StageN:debhelper (>= 7), byacc | bison, comerr-dev, docbook-to-man, libkeyutils-dev [!kfreebsd-i386 !kfreebsd-amd64 !hurd-i386] libncurses5-dev, libssl-dev, ss-dev, texinfo
A suggested syntax is to use a virtual architecture 'bootstrap' to specify build-deps that are not used for staged builds (i.e a syntax for 'Build recommends'):
Build-Depends:debhelper (>= 7), byacc | bison, comerr-dev, docbook-to-man, libkeyutils-dev [!kfreebsd-i386 !kfreebsd-amd64 !hurd-i386] libncurses5-dev, libssl-dev, ss-dev, texinfo, libldap2-dev [!bootstrap]
This is neat, but only allows for one 'stage' of build. This is a problem if more than one stage is actually needed - this is not yet clear (beyond the toolchain).
For packages which depend on themselves (usually languages), the Build-dependencies should be changed to depend on lang | lang-bootstrap. In a normal repository the (native version) lang-bootstrap will not be available so a lang will be used. In a bootstraping environment lang may well not be available in which case lang-bootstrap needs to be built. The bootstraping tool knows to do a staged build in this case.
Setting DEB_STAGE and building this package causes it to produce lang-bootstrap (which is normally not emitted). This is implemented by adding a new control stanza for lang-bootstrap and specifying --no-package=lang-bootstrap in debian/rules for normal builds, but not for the stage build (which will probably exclude a load of other stuff).
For documentation issues being able to specify DEB_BUILD_OPTS=nodocs would be simplest. Building with docs affects the dependencies, so it is not like other DEB_BUILD_OPTS, so perhaps this is not a good mechanism to use? Something generic is attractive if we can make it work.
Size of the problem
We are not sure exactly how many circular dependencies there are in Debian and Ubuntu. For many applications only a relatively small set of packages is needed and some analysis of the problem size will be recorded here in due course.
Whilst thre are a lot of loops, they tend to involve the same packages over and over, so our initial estimate is that less than 50 packages will need significant work for dependency-cycle breaking. Many more than that need cross-building work.
This is a build-loop graph for openldap, which shows a loop that affects many packages (such as apache2, mozilla, and of course the involved packages). Circles are source packages. Boxes are binary packages listed as build-deps and the diamond is the package that was given to xdeb to generate the initial dependency graph. The weights shown are the number of build-dependencies on other packages. The brightest green are the packages with the smallest number of such dependencies, indicating the likely easiest places to break the loops (because work is only needed in the one package and dependency). An expanded version shows the binary-packages involved in each build dependency.
Thanks to Jonathan Austin, Steve ?McIntyre, Steve Lanagsek and Loic Minier for helping clarify the thoughts described above.