Differences between revisions 30 and 62 (spanning 32 versions)
Revision 30 as of 2012-05-15 19:13:38
Size: 14666
Editor: wookey
Comment:
Revision 62 as of 2020-09-01 18:49:38
Size: 16948
Editor: ?HelmutGrohne
Comment: fix some old explanations, delete unfixably outdated sections
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
#language en
Line 4: Line 5:
This wiki page is intended to describe the issues of, and mechanisms for, bootstrapping Debian from zero to a full archive. Otherwise known as '''profile builds''' (and previously '''staged builds'''). It has developed somewhat over several years as practical experience has grown, and is now technically useable, and making its way into the archive.

The detailed spec is at BuildProfileSpec

<<TableOfContents()>>
Line 6: Line 13:
This wiki page is intended to describe the issues of, and mechanisms for, bootstrapping a Debian rootfs image from sources. Otherwise known as '''staged builds'''. There is a real need to bootstrap Debian from sources when doing new ports or flavours (sub-arch builds). Every new architecture or optimisation flavour needs to do this at least once, and making it easier than the current 'really very hard' would be great. It is also very useful for cross-compiling to new or non-self-hosted architectures, and for a genuinely new arch at least part of the system (toolchains+build-essential) has to be cross-built until there is enough to become self-hosting.
Line 8: Line 15:
There is a real need to bootstrap Debian from sources when doing new ports or flavours. Every new architecture or optimisation flavour needs to do this at least once, and making it easier than the current 'really very hard' would be great. It is also very useful for cross-compiling to new or non-self-hosted architectures, and for a genuinely new arch at least part of the system (toolchains+build-essential) has to be cross-built until there is enough to become self-hosting.

Recent new bootstraps have been done for sh4, armhf, uclibc and avr32. More are coming down the line, including the imminent arm64. Subarch flavoured rebuilds (e.g. to optimise for a particular CPU) are particularly useful on ARM and MIPS architectures.
New bootstraps are done every year. At the time of writing [[SH4|sh4]], [[ArmHardFloatPort|armhf]], [[https://alioth.debian.org/projects/i386-uclibc|uclibc]] and [[http://avr32.debian.net/|avr32]], and x32 have been done, and [[Arm64Port|arm64]] and mips64el are in progress. Subarch flavoured rebuilds (e.g. to optimise for a particular CPU) are particularly useful on ARM and MIPS architectures (Raspberry Pi is most popular recent example).
Line 14: Line 19:
Currently people tend to use non-debian tools (such as Yocto/gentoo/OpenEmbedded) to get a basic rootfs image of the target arch/ABI then do native building within that. This works but needs a great deal of manual loop-breaking and we really out to be able to bootstrap our own OS. It can also help to solve problems like libc ABI changes by rebuilding everything. Example: https://lists.debian.org/20140714203645.GA23576@hall.aurel32.net (But this was solved by upstream reverting the change in the end)

Currently people tend to use non-debian tools (such as Yocto/Gentoo/OpenEmbedded) to get a basic rootfs image of the target arch/ABI then do native building within that. This works but needs a great deal of manual loop-breaking and we really ought to be able to bootstrap our own OS.
Line 18: Line 25:
This work does need build-system and policy changes, which are detailed on this page. This work needs build-system and policy changes, which are detailed on this page.
Line 20: Line 27:
== Packaging principles == The goal is to be able to run regular automated bootstrap builds of the archive to check that it works, and allow porters to concentrate on actual porting work, rather than a lot of associated aggravation.
Line 22: Line 29:
An important principle is that the packaging changes necessary for this to work are reasonably clear and transparent. A Debian packager should not have to understand this stuff (staged builds and cross-building) in loving detail to avoid breaking things whilst making maintenance changes. Considering this principle helps when deciding between different technically-satisfactory ways of achieving things. == Status ==

The current (Jan 2014) status is:

 * build profiles are supported in dpkg 1.17.2 onwards, and apt 0.9.16
 * Support in the buildd infrastructure is still needed
 * Suite bootstrappability is tracked at https://bootstrap.debian.net/
 * Architecture cross-bootstrap is tracked by [[HelmutGrohne/rebootstrap|rebootstrap]]
 * Debian bootstrap from other operating systems is done by Daniel Schepler
 * Bugs are tracked at https://bugs.debian.org/cgi-bin/pkgreport.cgi?users=bootstrap@debian.org;tag=bootstrap

== Packaging/Design principles ==

An important principle in the design is that the packaging changes necessary for this to work are reasonably clear and transparent. A Debian packager should not have to understand this stuff (staged builds and cross-building) in loving detail to avoid breaking things whilst making maintenance changes.
Line 25: Line 45:

These principles informed the design when deciding between different technically-satisfactory ways of achieving things.
Line 29: Line 51:
The concept is simple: add support for minimal/reduced/staged builds to packages involved in build-dependency loops, so that build loops are broken. Also ensure packages cross-build properly so that an initial native-building system can be produced. The concept is simple: add support for minimal/reduced/staged builds to packages involved in build-dependency loops, so that build loops are broken. Also ensure packages cross-build properly so that an initial minimal native-building system can be produced.
Line 31: Line 53:
Working out which packages to modify, and how, is a manual process, done by examining build-dep loops and choosing which packages are most easily and cleanly modified. Once that is done, building bootstrap-able packages is an automatable process. Working out which packages to modify, and how, is fundamentally a manual process, done by manually examining build-dep loops and choosing which packages are most easily and cleanly modified. Since the "[[https://blog.mister-muffin.de/2013/01/25/bootstrappable-debian---new-milestone/|Port bootstrap build-ordering tool]]" [[SummerOfCode2012/StudentApplications/JohannesSchauer|project]] in the [[SummerOfCode2012|GSoC 2012]], the process of finding source packages which make sense to modify can be semi-automated. Once that is done, building bootstrap-able packages can be a fully automated process.
Line 33: Line 55:
This spec caters for multiple stages of staged/bootstrap build, so that if necessary a package can have stage-1, stage-2, etc before the final, normal, build. Almost all packages only need one stage other than the standard build. Only toolchain packages are known to have more than one stage at this time. This spec caters for multiple stages of staged/bootstrap build, so that if necessary a package can be built with reducing build profiles before the final, normal build.
Line 35: Line 57:
The reduced dependencies are specified in the control file, using normal Build-Depends: syntax, in new fields named Build-Depends-Stage1, Build-Depends-Stage2 etc. The reduced dependencies are specified in the control file, using a concept which is called '''build profiles'''. The build profile format was proposed by Guillem Jover together with other solutions he presented in bug#DebianBug:661538. Build profiles extend the Build-Depends format with a syntax similar to architecture restrictions but using < and > instead. The current Spec is detailed in BuildProfileSpec. Previous revisions were discussed on debian-devel and debian-dpkg.
Line 37: Line 59:
An environment variable (DEB_BUILD_OPTIONS) is used to control when packages are built in reduced staged/bootstrap mode, and at what stage.. debian/rules can check this variable and miss out some optional features to reduce the dependency tree (e.g. building kerberos without LDAP support). dpkg-buildpackage/dpkg-checkbuilddeps also checks the reduced/changed build-dependencies instead of the normal ones. {{{
Build-Depends: huge (>= 1.0) [i386 arm] <!cross !pkg.somepkg.nosomefeature>, tiny
}}}
Line 39: Line 63:
DEB_BUILD_OPTIONS is used because it is aready preserved and passed on by all build tools, and this is a build option like other things it is used to control. The build dependency "huge" would not be required by the source package if it is built in the "cross" or "pkg.somepkg.nosomefeature" profile. This mechanism neatly allows for removed build-deps, replaced build-deps and added build-deps, and an arbitrary number of possible 'profiles'.
Line 41: Line 65:
So setting DEB_BUILD_OPTIONS=stage1 will cause dpkg-buildpackage to call dpkg-checkbuilddeps with --stage=1 so that Build-Depends-Stage1 dependencies are checked, rather than the normal set. Similarly other tools like apt and xdeb will use the Build-Depends-StageN dependencies for the top-level package. Besides bootstrapping, these build profiles can also be used for embedded builds, to allow for changed build-deps when cross-building, and build-dependencies only needed for running tests.
Line 43: Line 67:
Bootstrapped/Staged packages are marked as such (in the control file) with X-Staged-Build=N and not uploaded to normal repositories. It is important to avoid accidentally mistaking a bootstrap/staged package for a 'real' (normally-built) package. As soon as possible a bootstrap package should be rebuilt as a full package, to avoid having to rebuild many packages aginst the full version once it is available. The details for this are not fleshed-out, but an extra control header seems the obvious thing to do. A version suffix may be useful too, mostly to help humans. This scheme supersedes an earlier version, (referred-to as 'staged' builds), which used repeated Build-depends-StageN: lines. See the dpkg bug#DebianBug:661538 for the evolution of this. The profile labels are arbitrary but agreement on label usage is necessary. The profile namespace is defined in the BuildProfileSpec.

An environment variable (DEB_BUILD_PROFILES) is used to control when packages are built in reduced staged/bootstrap mode, and at what stage.. debian/rules can check this variable and miss out some optional features to reduce the dependency tree (e.g. building kerberos without LDAP support, avahi without gtk and qt, etc). dpkg-buildpackage/dpkg-checkbuilddeps also checks the reduced/changed build-dependencies instead of the normal ones.

DEB_BUILD_PROFILES is used because it is preserved and passed on by all build tools, and this is a build option like other things it is used to control.

So setting `DEB_BUILD_PROFILES=cross` will cause `dpkg-buildpackage` to call `dpkg-checkbuilddeps` with `-Pcross` so that cross build dependencies are checked, rather than the normal set.

Bootstrapped/Staged packages are marked as such (in the control file) with a new field called Built-for-profiles:. They are not uploaded to normal repositories. It is important to avoid accidentally mistaking a bootstrap/staged package for a 'real' (normally-built) package. As soon as possible a bootstrap package should be rebuilt as a full package, to avoid having to rebuild many packages against the full version once it is available. A version suffix is no longer useful, because the .buildinfo file tracks the profiles used for building. A version suffix would break reproducibility.
Line 47: Line 79:
This process is sometimes called 'staged' builds as well as 'bootstrap' builds. Exact field names and variable names is a subject for bikeshedding. Whatever it most likely to be clear to developers and not clash with other purposes is best. This process is usually called 'profile' builds, but the terms 'staged' builds and 'bootstrap' builds have also been used as it developed.
Line 51: Line 83:
Proof-of-concept patches for packages that need to understand the new fields have been made. They are here:
http://wookware.org/software/cyclicdeps/patches/
Patches are tracked on BuildProfileSpec
Line 54: Line 85:
 * dpkg bug: DebianBug:661538
 * apt bug: DebianBug:661537
 * python-debian bug: coming soon
 * xdeb bug: DebianBug:669250
The dpkg bug gives a useful idea of how this spec has developed.
Line 59: Line 87:
Here are various older patches for packages, adding support for profiled builds of packages:
https://people.debian.org/~wookey/bootstrap/patches/profiles/packages/
Line 66: Line 96:
Debian/Ubuntu cross-building is documented here: https://wiki.linaro.org/CrossBuilding Debian/Ubuntu cross-building is documented here: https://wiki.linaro.org/Platform/DevPlatform/CrossCompile/CrossBuilding
Line 68: Line 98:
Patched sources to make Ubuntu Maverick base packages cross-buildable are here:
https://launchpad.net/~peter-pearse/+archive/cross-source
Patched sources with build-profile patches applied and extra multiarch and profile build support in package metadata are in https://people.debian.org/~wookey/bootstrap.html
Line 71: Line 100:
For cross-building to be reliable cross-dependency metadata needs to be in packages, so that it is clear whether a build dependency should be satisfied by the build architecture or the host architecture. Multiarch information can be used to provide this information along with build-dependency decoration for the farily rare exceptions. Details are specified here: https://wiki.ubuntu.com/MultiarchCross For cross-building to be reliable cross-dependency metadata needs to be in packages, so that it is clear whether a build dependency should be satisfied by the build architecture or the host architecture. Multiarch information can be used to provide this information along with build-dependency decoration for the fairly rare exceptions. Details are specified here: UbuntuWiki:MultiarchCross
Line 73: Line 102:
The current state of buildability using that technology is recorded here: http://people.linaro.org/~wookey/buildd/ The current state of buildability using this technology is recorded here: http://people.linaro.org/~wookey/buildd/ and http://people.canonical.com/~cjwatson/cross/armhf/raring/
Line 75: Line 104:
Until that metadata is in packages, heurisitics must be used, as implemented in xdeb (and the now-deprecated apt-cross), or all dependencies must be installed for both host and native, as implemented in xapt. These are all ugly and horrid, but better than nothing. Prior to useful amounts of multiarch metadata is in packages, heurisitics must be used, as implemented in xdeb (and the now-deprecated apt-cross), or all dependencies must be installed for both host and native, as implemented in xapt. These are all ugly and horrid, but better than nothing. xdeb or xapt are better than multiarch-cross in wheezy and precise, but multiarch-cross is where all the current work is, and as of end 2012 it works for much of the base system in Debian unstable and Ubuntu raring. From Maverick and Jessie onwards multiarch cross-building is the recommended method.
Line 79: Line 108:
The full automated bootstrapping process needs to keep track of staged builds and rebuilding things as needed so that they don't hang around any longer than necessary. However any such tool could get out of sync with the current status, unless it is always determinable from the current package-set state. This spec attempts to define things such that it is always intrinsically stateful. Please speak up if you see ways that this isn't going to work. The full automated bootstrapping process needs to keep track of staged builds and rebuilding things as needed so that they don't hang around any longer than necessary. However any such tool could get out of sync with the current status, unless it is always determinable from the current package-set state. This spec attempts to define things such that it is always intrinsically stateful.
Line 81: Line 110:
Packges do need to be uploaded to a local repository in order to satisfy build-dependecies (that's the whole point). However because packages of the same version get built, and thus uploaded, more than once we need a way to deal with this. Proposed is to append ~stageN+M to the package version automatically, where N is the stage number and M a continuously incremented (by the buildd) number. Or do binNMUs, which are already recognised by the tools and package management system very well, and almost all packages are (supposedly) binNMU safe. Packages do need to be uploaded to a bootstrap repository in order to satisfy build-dependecies (that's the whole point). However because packages of the same version get built, and thus uploaded, more than once we need a way to deal with this. One proposal is to append ~stageN+M to the package version automatically, where N is the stage number and M a continuously incremented (by the buildd) number. Or do binNMUs, which are already recognised by the tools and package management system very well, and almost all packages are (supposedly) binNMU safe. This was awkward when combined with multiarch until the separate binNMU changelog feature was implemented.
Line 83: Line 112:
See 'Changed binary packages' below for a discussion of how profiles causing a binary package normally built to be omitted should be dealt with. The meta-data in package Build-Deps allows static analysis tools to determine in advance if the package set has a fully linearisable build-order or not. The [[https://gitlab.mister-muffin.de/debian-bootstrap/botch|botch]] tool can do this analysis.
Line 84: Line 114:
== Toolchain == == Heuristics to detect bootstrapping problems ==
Line 86: Line 116:
The toolchain has a complex 2 or 3-stage bootstrapping process involving binutils, gcc, libc and kernel-headers. It has been fixed up (in the Ubuntu maverick packaging onwards) to bootstrap itself. It currently uses the DEB_STAGE variable name internally to control the build. Changing that to DEB_BUILD_OPTIONS=stage=N is not difficult. Ideally the following would be converted into a tool that detects and reports this problems, so that maintainers can fix them. These would be shown on a site such as the PTS or tracker.debian.org.
Line 88: Line 118:
Using multiarch to supply cross-libc actual removes some of the complexity of cross-toolchain building, and is the subject of a 2012 GSOC project.   * Packages that contain perl .so (XS modules, but the same could be done for python or other language modules), and are Build-Depended on by another package, implies it needs to be a build tool or used by one, because the program should not be linking against those. Those Build-Depends need to be marked :native (reliable).
  * Packages that once excluding irrelevant stuff like /usr/share/doc and similar only contain *.so* files, implies they need to be tagged M-A:same (reliable).
  * Packages that are Arch:any once excluding irrelevant stuff only contain scripts in bin/*, and those match digests on different architectures then it means it's probably M-A:foreign safe (they might still be calling arch specific interfaces) (guess).
  * If a package has unsatisfiable build dependencies, and all packages in the build dependency chain are marked with M-A values, but one is not, then that package is at fault (reliable).
  * A hard-coded list of interpreters could be maintained, those would need to be marked M-A:allowed (reliable).
Line 90: Line 124:
The cross-toolchain has also had 'flavoured builds' added so that it is easy to rebuild the tolchain locally for a different default CPU/ISA/optimisation unit. (e.g with/without VFP or for v5/v7 instruction set on ARM). == History ==
Line 92: Line 126:
== Circular dependencies/staged builds == This section collects some information about how this concept has developed over time in order to make the above page clearer.
Line 94: Line 128:
The main issue is circular build-dependencies. These fall into three main areas:
 * Most languages depend on themselves to build (gcc, openjdk, mono, haskell, perl, python, ada(gnat)).
 * Libraries sometimes circularly depend:<<BR>>
    kerberos -> ldap -> kerberos<<BR>>
    qt -> poppler -> cups -> qt
 * Documentation packages. Many packages need documentation tools (sgmltools, jade, tex, doxygen) which cannot be built until many other packages are built. This is largely only a problem for native-builds, as the doc-tools are generally available when cross-building.
The initial proposal was written down in 2010 as a result of a UDS (ubuntu Developer Summit) discussion: UbuntuWiki:Specs/M/ARMAutomatedBootstrap
Back then the concept was called 'Staged Builds', and used a simple extra line in the control file repeating and modifying the normal build-dependencies: Build-Depends-StageN
and builds controlled with DEB_BUILD_OPTIONS=profile=stage1
Line 101: Line 132:
The generic way to deal with all of these is 'staged builds', where a version of the package is built with lesser functionality and thus a smaller dependency tree. This allows the depending package to then be built, then for the 'staged' package to be built normally. This made it very easy to implement as existing tools will not barf as no new syntax is used. However it is an ugly implementation inside dpkg with many extra fields defined and a limit (of 2) on how many stages could be supported. No other profiles types are possible without hard-defining more fields inside dpkg.
See DebianBug:661538 for how that developed.
Line 103: Line 135:
This could be controlled by a tool that keeps track of which packages have currently been built as 'staged' packages and thus need rebuilding, but if we can correctly encode things in dependencies then this process can be made automatic and intrinsic. Exactly how this needs to be done is the subject of ongoing study. 3 successive years of GSOC projects by Gustavo Alkmim, P. J !McDermott, Johannes Schauer, and Thibaut Girka, progressed the practical and theoretical work, as well as heavy sponsorship of this work by Linaro and Canonical which enabled a lot of the supporting cross-building and multiarch work to get done.
Line 105: Line 137:
A partial spec has been proposed here: https://wiki.ubuntu.com/Specs/M/ARMAutomatedBootstrap
This document fills out that spec and proposes some further ideas and changes.
It became clear from actually doing staged builds that it made more sense if cross-building, test-skipping, doc-skipping and language/tool extra dependencies (bootstrapping) changes were described independently rather than arbitrarily lumped together in the way a bootstrap on a particular arch might like it. So the newer profile modifier syntax was developed and implemented, whilst being tested on actual cross-builds and bootstraps.
Line 108: Line 139:
CircularBuildDependencies is a list of loops found in the last analysis (run in early 2011).

== Specifying stages ==

'Staged' builds are invoked by setting DEB_BUILD_OPTIONS=stage=N (where N is '1' or '2') to specify a staged build to dpkg-buildpackage. When no 'stage=N' option is set then a normal build occurs. Some packages may need more than one staged build. We do not know what the maximum number of stages needed is: it is proably two, but to assume so would be foolish. We count up from Stage1, Stage2 to 'normal'. Hopefully this is reasonably clear to the average packager.

Any 'staged' package must be identified as such in the metadata so it is not accidentally uploaded as a 'real' package. e.g. a X-Staged-Build=N header?

It must be possible for the build-tools to identify what build-stages are available. We propose Build-Depends-StageN headers, one for each stage. The existence of that defines such a stage as being available.

Let's consider kerberos as a typical example of a library package involved in a circular dependency. krb5 needs libldap2-dev to build (from openldap). openldap need libkrb5-dev (from krb5) to build. To fix this we add a staged build to krb5 to miss out the generation of the krb5-ldap package. This is easy to do with a debhelper-based package by simply setting DH_OPTIONS="--no-package=krb5-ldap", and running configure with --without-ldap (when DEB_BUILD_OPTIONS=stage=1).


=== Dealing with changed build dependencies ===

Build-Depends-StageN simply list all the build-dependencies again except changing or missing out some as required. This does need to be maintained along with the normal build-depends. This makes it very easy to implement. It would be nice to just list a 'diff' from the normal build-dependency list - i.e. 'except package-foo' or 'package-minimal instead of package'. I'm not sure this is practical, but if anyone can work out how to do it...

So for krb5 we'd add:
{{{
Build-Depends-Stage1:debhelper (>= 7), byacc | bison, comerr-dev, docbook-to-man,
 libkeyutils-dev [!kfreebsd-i386 !kfreebsd-amd64 !hurd-i386]
 libncurses5-dev, libssl-dev, ss-dev, texinfo
}}}



For packages which depend on themselves (usually languages), the Build-dependencies should be changed to depend on lang | lang-bootstrap. In a normal repository the (native version) lang-bootstrap will not be available so a lang will be used. In a bootstraping environment lang may well not be available in which case lang-bootstrap needs to be built. The bootstraping tool knows to do a staged build in this case.

Setting DEB_BUILD_OPTIONS=stage=1 and building this package causes it to produce lang-bootstrap (which is normally not emitted). This is implemented by adding a new control stanza for lang-bootstrap and specifying --no-package=lang-bootstrap in debian/rules for normal builds, but not for the stage build (which will probably exclude a load of other stuff).

=== Documentation loops ===

For documentation issues being able to specify DEB_BUILD_OPTIONS=nodocs would be simplest. Building with docs affects the dependencies, so it is not like other DEB_BUILD_OPTIONS, so perhaps this is not a good mechanism to use? Something generic is attractive if we can make it work.

Documentation loops are primarily an issue for native building, although they do cause issues for cross-building too (gobject introspection, perl module docs).
Over 2013 an actual spec and implementation in dpkg was pinned down, agreed and implemented, leading to a profile-supporting dpkg upload at the end of 2013.
Line 153: Line 149:
  * 2011.07.29
    [[http://penta.debconf.org/dc11_schedule/events/745.en.html|Talk by Wookey at Debconf 2011 on this subject]]
Line 156: Line 150:
    http://wiki.debian.org/SummerOfCode2011/AutomatedBootstrapping/GustavoAlkmim     SummerOfCode2011/AutomatedBootstrapping/GustavoAlkmim
  * [[MultiarchCrossToolchainBootstrap|How to bootstrap cross toolchain]]

== Talks ==

 * mini-DebConf Curitiba, Brasil 2017: "[[DebianEvents/br/2017/MiniDebconfCuritiba/Atividades/CFP#Bootstrapping_novas_arquiteturas_em_Debian|Bootstrapping novas arquiteturas em Debian]]" by Breno Henrique Leitão, Fernando Seiti Furusato
 * mini-DebConf Paris, France 2012-11-25: [[http://fr2012.mini.debconf.org/#schedule|Bootstrapping Debian (or derivatives) for a new architecture]] by Pietro Abate ([[http://fr2012.mini.debconf.org/slides/bootstrapping-new-arch.pdf|slides]], [[http://meetings-archive.debian.net/pub/debian-meetings/2012/mini-debconf-paris/pietro-abate.ogv|video]])
 * DebConf11 2011-07-29: [[http://penta.debconf.org/dc11_schedule/events/745.en.html|Bootstrappable Debian]] by Wookey ([[http://wookware.org/talks/bootstrappable-Debconf11.tar.gz|slides]], [[http://meetings-archive.debian.net/pub/debian-meetings/2011/debconf11/high/745_Bootstrapable_Debian.ogv|video]])

== Links ==

 * [[HelmutGrohne/rebootstrap|rebootstrap]]: constantly re-bootstrapping Debian from Debian amd64 to all architectures, via cross-builds
 * [[http://bootstrappable.org/|Bootstrappable builds]]
Line 160: Line 166:
Thanks to Jonathan Austin, Steve McIntyre, Steve Lanagsek and Loic Minier for helping clarify the thoughts described above. Thanks to Jonathan Austin, [[SteveMcIntyre|Steve McIntyre]], Steve Langasek, Loic Minier, [[PJMcDermott|P. J. McDermott]], and Johannes Schauer for helping clarify the thoughts described above.
Line 164: Line 170:
CategoryEmdebian CategoryDebianDevelopment
CategoryEmdebian | CategoryDeveloper

Bootstrapping

This wiki page is intended to describe the issues of, and mechanisms for, bootstrapping Debian from zero to a full archive. Otherwise known as profile builds (and previously staged builds). It has developed somewhat over several years as practical experience has grown, and is now technically useable, and making its way into the archive.

The detailed spec is at BuildProfileSpec

Overview

There is a real need to bootstrap Debian from sources when doing new ports or flavours (sub-arch builds). Every new architecture or optimisation flavour needs to do this at least once, and making it easier than the current 'really very hard' would be great. It is also very useful for cross-compiling to new or non-self-hosted architectures, and for a genuinely new arch at least part of the system (toolchains+build-essential) has to be cross-built until there is enough to become self-hosting.

New bootstraps are done every year. At the time of writing sh4, armhf, uclibc and avr32, and x32 have been done, and arm64 and mips64el are in progress. Subarch flavoured rebuilds (e.g. to optimise for a particular CPU) are particularly useful on ARM and MIPS architectures (Raspberry Pi is most popular recent example).

This is also helpful when bringing a lagged architecture up to date, especially considering documentation tools (that may be too old), optional dependencies (that may be too old or not exist) like php5, which depends on everything and the kitchen sink, etc. I wish this had been implemented when starting to work on m68k…

It can also help to solve problems like libc ABI changes by rebuilding everything. Example: https://lists.debian.org/20140714203645.GA23576@hall.aurel32.net (But this was solved by upstream reverting the change in the end)

Currently people tend to use non-debian tools (such as Yocto/Gentoo/OpenEmbedded) to get a basic rootfs image of the target arch/ABI then do native building within that. This works but needs a great deal of manual loop-breaking and we really ought to be able to bootstrap our own OS.

Putting the necessary bootstrapping metadata and build rules into the packages themselves in an orderly fashion enables the info to be maintained easily. QA tests to report on breakage will help enormously here. It also makes for a repeatable and deterministic process.

This work needs build-system and policy changes, which are detailed on this page.

The goal is to be able to run regular automated bootstrap builds of the archive to check that it works, and allow porters to concentrate on actual porting work, rather than a lot of associated aggravation.

Status

The current (Jan 2014) status is:

Packaging/Design principles

An important principle in the design is that the packaging changes necessary for this to work are reasonably clear and transparent. A Debian packager should not have to understand this stuff (staged builds and cross-building) in loving detail to avoid breaking things whilst making maintenance changes.

All the metadata needed should be part of the packages, so it can be maintained over time. Any solution with external patches/metadata is doomed to bitrot.

These principles informed the design when deciding between different technically-satisfactory ways of achieving things.

Mechanisms

The concept is simple: add support for minimal/reduced/staged builds to packages involved in build-dependency loops, so that build loops are broken. Also ensure packages cross-build properly so that an initial minimal native-building system can be produced.

Working out which packages to modify, and how, is fundamentally a manual process, done by manually examining build-dep loops and choosing which packages are most easily and cleanly modified. Since the "Port bootstrap build-ordering tool" project in the GSoC 2012, the process of finding source packages which make sense to modify can be semi-automated. Once that is done, building bootstrap-able packages can be a fully automated process.

This spec caters for multiple stages of staged/bootstrap build, so that if necessary a package can be built with reducing build profiles before the final, normal build.

The reduced dependencies are specified in the control file, using a concept which is called build profiles. The build profile format was proposed by Guillem Jover together with other solutions he presented in bug#661538. Build profiles extend the Build-Depends format with a syntax similar to architecture restrictions but using < and > instead. The current Spec is detailed in BuildProfileSpec. Previous revisions were discussed on debian-devel and debian-dpkg.

Build-Depends: huge (>= 1.0) [i386 arm] <!cross !pkg.somepkg.nosomefeature>, tiny

The build dependency "huge" would not be required by the source package if it is built in the "cross" or "pkg.somepkg.nosomefeature" profile. This mechanism neatly allows for removed build-deps, replaced build-deps and added build-deps, and an arbitrary number of possible 'profiles'.

Besides bootstrapping, these build profiles can also be used for embedded builds, to allow for changed build-deps when cross-building, and build-dependencies only needed for running tests.

This scheme supersedes an earlier version, (referred-to as 'staged' builds), which used repeated Build-depends-StageN: lines. See the dpkg bug#661538 for the evolution of this. The profile labels are arbitrary but agreement on label usage is necessary. The profile namespace is defined in the BuildProfileSpec.

An environment variable (DEB_BUILD_PROFILES) is used to control when packages are built in reduced staged/bootstrap mode, and at what stage.. debian/rules can check this variable and miss out some optional features to reduce the dependency tree (e.g. building kerberos without LDAP support, avahi without gtk and qt, etc). dpkg-buildpackage/dpkg-checkbuilddeps also checks the reduced/changed build-dependencies instead of the normal ones.

DEB_BUILD_PROFILES is used because it is preserved and passed on by all build tools, and this is a build option like other things it is used to control.

So setting DEB_BUILD_PROFILES=cross will cause dpkg-buildpackage to call dpkg-checkbuilddeps with -Pcross so that cross build dependencies are checked, rather than the normal set.

Bootstrapped/Staged packages are marked as such (in the control file) with a new field called Built-for-profiles:. They are not uploaded to normal repositories. It is important to avoid accidentally mistaking a bootstrap/staged package for a 'real' (normally-built) package. As soon as possible a bootstrap package should be rebuilt as a full package, to avoid having to rebuild many packages against the full version once it is available. A version suffix is no longer useful, because the .buildinfo file tracks the profiles used for building. A version suffix would break reproducibility.

Terminology

This process is usually called 'profile' builds, but the terms 'staged' builds and 'bootstrap' builds have also been used as it developed.

Available Patches

Patches are tracked on BuildProfileSpec

The dpkg bug gives a useful idea of how this spec has developed.

Here are various older patches for packages, adding support for profiled builds of packages: https://people.debian.org/~wookey/bootstrap/patches/profiles/packages/

Cross-building

Bootstrapping is closely related to support for cross-building Debian packages because at least part of the process must be done cross. Enough packages to make a bootable image need to be cross-buildable, because you cannot magic a system out of thin air. To move from cross to native building you need build-essential to be cross-buildable.

The number of build-loops that must be broken for cross-building is much smaller than the number that need to be broken for native building. This spec proposes that we start by fixing the loops that mean you can't even cross-build a base Debian image before going on to fix all the packages which have native build-dep loops.

Debian/Ubuntu cross-building is documented here: https://wiki.linaro.org/Platform/DevPlatform/CrossCompile/CrossBuilding

Patched sources with build-profile patches applied and extra multiarch and profile build support in package metadata are in https://people.debian.org/~wookey/bootstrap.html

For cross-building to be reliable cross-dependency metadata needs to be in packages, so that it is clear whether a build dependency should be satisfied by the build architecture or the host architecture. Multiarch information can be used to provide this information along with build-dependency decoration for the fairly rare exceptions. Details are specified here: MultiarchCross

The current state of buildability using this technology is recorded here: http://people.linaro.org/~wookey/buildd/ and http://people.canonical.com/~cjwatson/cross/armhf/raring/

Prior to useful amounts of multiarch metadata is in packages, heurisitics must be used, as implemented in xdeb (and the now-deprecated apt-cross), or all dependencies must be installed for both host and native, as implemented in xapt. These are all ugly and horrid, but better than nothing. xdeb or xapt are better than multiarch-cross in wheezy and precise, but multiarch-cross is where all the current work is, and as of end 2012 it works for much of the base system in Debian unstable and Ubuntu raring. From Maverick and Jessie onwards multiarch cross-building is the recommended method.

Automated Bootstrapping

The full automated bootstrapping process needs to keep track of staged builds and rebuilding things as needed so that they don't hang around any longer than necessary. However any such tool could get out of sync with the current status, unless it is always determinable from the current package-set state. This spec attempts to define things such that it is always intrinsically stateful.

Packages do need to be uploaded to a bootstrap repository in order to satisfy build-dependecies (that's the whole point). However because packages of the same version get built, and thus uploaded, more than once we need a way to deal with this. One proposal is to append ~stageN+M to the package version automatically, where N is the stage number and M a continuously incremented (by the buildd) number. Or do binNMUs, which are already recognised by the tools and package management system very well, and almost all packages are (supposedly) binNMU safe. This was awkward when combined with multiarch until the separate binNMU changelog feature was implemented.

See 'Changed binary packages' below for a discussion of how profiles causing a binary package normally built to be omitted should be dealt with. The meta-data in package Build-Deps allows static analysis tools to determine in advance if the package set has a fully linearisable build-order or not. The botch tool can do this analysis.

Heuristics to detect bootstrapping problems

Ideally the following would be converted into a tool that detects and reports this problems, so that maintainers can fix them. These would be shown on a site such as the PTS or tracker.debian.org.

  • Packages that contain perl .so (XS modules, but the same could be done for python or other language modules), and are Build-Depended on by another package, implies it needs to be a build tool or used by one, because the program should not be linking against those. Those Build-Depends need to be marked :native (reliable).
  • Packages that once excluding irrelevant stuff like /usr/share/doc and similar only contain *.so* files, implies they need to be tagged M-A:same (reliable).
  • Packages that are Arch:any once excluding irrelevant stuff only contain scripts in bin/*, and those match digests on different architectures then it means it's probably M-A:foreign safe (they might still be calling arch specific interfaces) (guess).

  • If a package has unsatisfiable build dependencies, and all packages in the build dependency chain are marked with M-A values, but one is not, then that package is at fault (reliable).
  • A hard-coded list of interpreters could be maintained, those would need to be marked M-A:allowed (reliable).

History

This section collects some information about how this concept has developed over time in order to make the above page clearer.

The initial proposal was written down in 2010 as a result of a UDS (ubuntu Developer Summit) discussion: Specs/M/ARMAutomatedBootstrap Back then the concept was called 'Staged Builds', and used a simple extra line in the control file repeating and modifying the normal build-dependencies: Build-Depends-StageN and builds controlled with DEB_BUILD_OPTIONS=profile=stage1

This made it very easy to implement as existing tools will not barf as no new syntax is used. However it is an ugly implementation inside dpkg with many extra fields defined and a limit (of 2) on how many stages could be supported. No other profiles types are possible without hard-defining more fields inside dpkg. See 661538 for how that developed.

3 successive years of GSOC projects by Gustavo Alkmim, P. J McDermott, Johannes Schauer, and Thibaut Girka, progressed the practical and theoretical work, as well as heavy sponsorship of this work by Linaro and Canonical which enabled a lot of the supporting cross-building and multiarch work to get done.

It became clear from actually doing staged builds that it made more sense if cross-building, test-skipping, doc-skipping and language/tool extra dependencies (bootstrapping) changes were described independently rather than arbitrarily lumped together in the way a bootstrap on a particular arch might like it. So the newer profile modifier syntax was developed and implemented, whilst being tested on actual cross-builds and bootstraps.

Over 2013 an actual spec and implementation in dpkg was pinned down, agreed and implemented, leading to a profile-supporting dpkg upload at the end of 2013.

These are some related activities and documents which have generated input for this one.

Talks

Credits

Thanks to Jonathan Austin, Steve McIntyre, Steve Langasek, Loic Minier, P. J. McDermott, and Johannes Schauer for helping clarify the thoughts described above.


CategoryEmdebian | CategoryDeveloper