This wiki page is intended to be the start of an effort to identify the key points to be able to strap and boot Debian from sources.
There is a real need to bootstrap Debian from sources from porter point of view. Every new architecture or ABI flavour needs to do this at least once, and making it easier than the current 'really very hard' would be great. It is also very useful for cross-compiling to new or non-self-hosted architectures, and for a genuinely new arch at least part of the system (toolchains+build-essential) has to be cross-built until there is enough to become self-hosting.
Recent new bootstraps have been done for sh4, armhf, uclibc and avr32. More are coming down the line. The subarch flavoured rebuilds are particularly useful on ARM and MIPS architectures.
Bootstrapping is closely tied to support for cross-building Debian packages because at least part of the process must be done cross. All of build-essential and the toolchain must be cross-buildable.
Debian/Ubuntu cross-building is documented here: https://wiki.linaro.org/CrossBuilding
Patched sources to make Ubuntu Maverick base packages cross-buildable are here: https://launchpad.net/~peter-pearse/+archive/cross-source
For the process to be reliable cross-dependency metadata needs to be in packages, as specified here: https://wiki.ubuntu.com/MultiarchCross That can't happen until build tools everywhere can parse the decorated build-dependency data. So patches are being maintained outside the main archives as a proof-of-technology.
Bootstrapping could be managed by a tool like xdeb or sbuild, keeping track of staged builds and rebuilding things as needed so that staged builds don;t hang around any longer than necessary. However any such tool could get out of sync with the current status, unless it is always determinable from the current package-set state. This spec attempts to define things such that it is always intrinsically stateful. Please speak up if you see ways that this isn't going to work.
The toolchain has a complex bootstrapping process involving binutils, gcc, libc and kernel-headers. It has been fixed up (in Ubuntu/Linaro) to bootstrap itself. This has currently only been demonstrated on armel. Once tested/extended to other architectures it can be uploaded in Debian. This work is ongoing, by Marcin Juszkiewicz and Hector Oron.
The toolchain has also had 'flavoured builds' added so tha tit is easy to rebuild the tolchain locally for a different default CPU/ISA/optimisation unit. (e.g with/without VFP or for v5/v7 instruction set on ARM).
The main issue is circular build-dependencies. These fall into three main areas:
- Most languages depend on themselves to build (gcc, openjdk, mono, haskell, perl, python).
Libraries sometimes circularly depend: kerberos -> ldap -> kerberos
qt -> libpoppler -> cups -> qt
- Documentation packages. Many packages need documentation tools (sgmltools, jade, tex, doxygen) which cannot be built until many other packages are built.
The generic way to deal with all of these is 'staged builds', where a version of the package is built with lesser functionality and thus a smaller dependency tree. This allows the depending package to then be built, then for the 'staged' package to be built normally.
A partial spec has been proposed here: https://wiki.ubuntu.com/Specs/M/ARMAutomatedBootstrap This document fills out that spec and proposes some further ideas and changes.
'Staged' builds are invoked by using DEB_STAGE variable to specify a staged build to dpkg-buildpackage. When it is not set, zero or null then a normal build occurs. Some packages may need more than one pre-stage. We do not know what the maximum number of stages needed is: it may be two, but to assume so would be foolish so counting up from negtive numbers makes it clear what is going on. So we count up from DEB_STAGE=-n, DEB_STAGE=-1 and 'normal' (or DEB_STAGE=0). (This is a change from ARMAutomatedBootstrap which sinmply has stages counting upwards 1,2,N)
Any 'staged' package must be identified as such in the metadata so it is not accidentally uploaded as a 'real' package. Is the 'UNRELEASED' codename indicator sufficient or do we need something more explicit: e.g. X-Staged-Build:N header?
It must be possible for the build-tools to identify what build-stages are available. ARMAutomatedBootstrap proposes Build-Depends-StageN headers, one for each stage. The existence of that defines such a stage as being available.
For documentation issues being able to specify DEB_BUILD_OPTS=nodocs would be simplest. Building with docs affects the dependencies, so it is not like other DEB_BUILD_OPTS, so perhaps this is not a good mechanism to use? Something generic is attractive if we can make it work.
Size of the problem
We are not sure exactly how many circular dependencies there are in Debian and Ubuntu. For many applications only a relatively small set of packages is needed and some analysis of the problem size will be recorded here in due course.
We expect that less than 50 packages will need significant work.