With free software, anyone can inspect the source code for malicious flaws. But Debian provide binary packages to its users. The idea of “deterministic” or “reproducible” builds is to empower anyone to verify that no flaws have been introduced during the build process by reproducing byte-for-byte identical binary packages from a given source.
Contents
Why do we want reproducible builds?
- Allow independent verifications that a binary matches what the source intended to produce.
Help Multi-Arch: same packages co-installation (as they need every matching file to be byte identical).
- Be able to generate debug symbols for packages which do not have a “debug package”.
- Ensure packages can be built from source. The archive could be made to only accept reproducible uploads: the maintainer would stop uploading .deb files but keep them referenced in the .changes. A buildd would then build the source. Only if the hash matches the upload gets accepted.
- Allow file-level deduplication on Debian mirror sites, or maybe snapshots.d.o, of .deb files whose contents didn't really change between versions.
- Allow .deb deltas to be smaller.
Packages with build profiles must offer the exact same functionality for all profiles. Reproducible builds could be use to verify that it is the case.
Making sure that Architecture:all packages are build identically on different build architectures.
Validate cross-builds against native builds.
Reproducing builds
There are two sides to the problem: the build environment needs to be recorded during the initial build, and the same environment needs to be reproduced for later rebuilds.
Recording the environment
Information on a build will be recorded in a new control file with extension `.buildinfo`.
Reproduce the build environment
The srebuild program is a sbuild wrapper which finds a timestamp from snapshot.debian.org containing all versions of the binary packages in a .buildinfo file and then carries out the build with the right versions installed.
See srebuild.
References
- Deterministic virtual machines:
Reflections on Trusting Trust, by Ken Thompson
Fully Countering Trusting Trust through Diverse Double-Compiling (DDC) - a PhD dissertation on how to use reproducible builds to counter the "trusting trust" attack on compilers
Is that really the source code for this software? by Jos van den Oever on blogs.kde.org (2013-06-19). Compare reproducing tar from the Debian, Fedora and OpenSUSE packages.
Deterministic Builds Part One: Cyberwar and Global Compromise by Mike Perry
Deterministic Builds Part Two: Technical Details by Mike Perry
Verifying the source code for binaries by Jake Edge in Linux Weekly News.
Colin Watson's answer on ubuntu-devel to “Will Ubuntu use "reproducible builds" as debian is planning to do?”
Why and How of Reproducible Builds: Distrusting Our Own Infrastructure for Safer Software Releases, Seth Schoen and Mike Perry at Mozilla San Francisco, 2014-11-05
- Misc. upstream discussions:
Octave: bug report and mailing list thread
groff: mailing list thread
GHC (Glasgow Haskell Compiler): #4012
Presentations
Reproducible Builds for Debian, Distributions devroom, FOSDEM’14, Video, Slides (Sources)
Reproducible Builds, a year later, DebConf14, Video, Slides (Sources)
Reproducible Builds, Moving Beyond Single Points of Failure for Software Distribution, 31th Chaos Communication Congress, Video, Slides
Related projects
CARE monitors the execution of the specified command to create an archive that contains all the material required to re-execute it in the same context.
Further work
Having reproducible builds allows us to trust binary packages better, because it becomes easier to have:
- diversity of buildd location and jurisdiction - build packages in more than one location, including the developer's
- diversity of buildd hardware, in case of hardware bugs, or malicious implants - a mix of VMs, some real hardware, different CPU manufacturers, different date of manufacture and supplier
- diversity of people - multiple signatures on a .changes file
- diversity of kernels, explained below
Kernel packages
Special features of kernel packages (including bootloaders and hypervisors) - GRUB2, Xen, linux, kfreebsd...
- we put huge trust in them - kernels are the ultimate target of any rootkit, able to completely hide from userland
- a kernel image built for amd64, if the build system is portable and reproducible enough, will be the same whether built from linux-amd64 or kfreebsd-amd64
- or maybe from different kernel versions - for example, a jessie build chroot on a wheezy host system
Then we would be better protected from something that could affect many systems at once, such as a kernel vulnerability; or widespread infection by a rootkit, which now must be compatible with more than one type of kernel to go unnoticed.