It should be possible to reproduce, byte for byte, every build of every package in Debian.
This is an on-going project. To participate, we recommend you create an account on this wiki, subscribe to this page, and join the reproducible-builds@lists.alioth.debian.org mailing list.
Contents
- Drivers
- Why do we want reproducible builds?
- Status
- Useful things you (yes, you!) can do
- Reported bugs
- Archive wide rebuilds
- Reproducing builds
-
Identified problems, and possible solutions
- Files in data.tar.gz contains build paths
- Files in data.tar.gz depends on readdir order
- Files in data.tar.gz varies with the locale
- Files in data.tar.gz contains hostname, uname output, username
- Files in data.tar.gz contains timestamps
- Random variation
- Members of control.tar have varying mtime
- {data,control}.tar.{gz,xz,bz2} does not have timestamps
- {data,control}.tar.{gz,xz,bz2} will store files in readdir order
- .deb ar-archive header contains a timestamp
- building the kernel
- Custom build environment
- bash script to compare two package builds
- How to build a deb using faketime
- Further work
- References
- Presentations
- Publicity
- Related projects
Drivers
- Lunar
Why do we want reproducible builds?
- Allow independent verifications that a binary matches what the source intended to produce.
Help Multi-Arch: same packages co-installation (as they need every matching file to be byte identical).
- Be able to generate debug symbols for packages which do not have a “debug package”.
- Ensure packages can be built from source. The archive could be made to only accept reproducible uploads: the maintainer would stop uploading .deb files but keep them referenced in the .changes. A buildd would then build the source. Only if the hash matches the upload gets accepted.
- Allow file-level deduplication on Debian mirror sites, or maybe snapshots.d.o, of .deb files whose contents didn't really change between versions.
- Allow .deb deltas to be smaller.
Packages with build profiles must offer the exact same functionality for all profiles. Reproducible builds could be use to verify that it is the case.
Status
- Current focus in on the toolchain: trying to get as few changes as possibles in key packages to make as many builds as possible reproducible.
On 2013-09-07, a first rebuild of the archive was done by David Suárez with a modified version of dpkg varying build path and time. This has already uncovered several archive-wide issues that we can address.
- A build tool that would reproduce a build environment using packages from snapshot.debian.org is still missing.
Useful things you (yes, you!) can do
Sort out the results of the latest rebuild. See some notes on how to do so.
- Find a way to remove prevent jar from writing timestamps.
- Fing a way to prevent javadoc from writing timestamps.
- Find a way to prevent Epydoc from writing timestamps and output links in filesystem order.
- Understand why binaries produced by Mono are different.
Write a patch for dpkg-buildpackage to export GZIP=-n (or maybe not, maintainers were quite unhappy with dpkg-buildflags exporting CFLAGS)
- Write a script to rebuild a package from a .changes file. It should only be about adding the version of snapshot.debian.org matching the time of the build to the pbuilder/sbuild chroot before starting the build process.
- Write a script to rebuild a package from a list of binary packages and their respective versions. See below for some preliminary work by Lunar that has already been done. One way to do that is to download the deb files from snapshot.debian.org and then put them into a chroot with pbuilder, and then use pbuilder to do a build of the desired Debian package.
Research about other distributions: NixOS, SUSE (see build-compare), then write about it on your blog and link to it on this wiki page.
If you want to help with this, feel free to ping the mailing list or edit this wiki page.
Reported bugs
All bugs relevant to the reproducible builds project should use usertags with user reproducible-builds@lists.alioth.debian.org.
Current usertags in use:
- toolchain
- affects a tool used by other package build systems
- infrastructure
- affects the whole Debian infrastructure or policies
- timestamps
- time of build in recorded during the build process
- fileordering
- build output varies with readdir() order
- buildpath
- path of sources is recorded during the build process
Archive wide rebuilds
2013-09-07 by David Suárez. 24% of 5240 source packages reproducible. Variations: time, build path.
2014-01-26 by David Suárez. 67% of 6887 source packages reproducible. Variations: time, build path.
Reproducing builds
There are two sides to the problem: first we need to record the initial build environment, and then we need a way to set up the same environment.
Recording the environment
Information on a build will be recorded in a new control file with extension .buildinfo.
.buildinfo example
The following file would be named fweb_1.62-12_i386.buildinfo:
Format: 1.9 Build-Archictecture: i386 Source: fweb Version: 1.62-12 Binary: fweb fweb-doc Architecture: all i386 Checksums-Sha256: 9921500c4c6159c0019d4b8b600d2d06eef6b1da056abd2f78e66a9f0c3843b9 879 fweb_1.62-12.dsc 3a7492c2013fbeebff08bee0514481ec0f56d2c4d138188d1ef85156d08ded00 436982 fweb-doc_1.62-12_all.deb a916dbb1c63707eaf52a5cdd10769871d2f621848176dc8f7ab4f0dcd999af85 229990 fweb_1.62-12_i386.deb Build-Environment: acl (= 2.2.52-1), adduser (= 3.113+nmu3), base-files (= 7.5), base-passwd (= 3.5.33), bash (= 4.3-9), binutils (= 2.24.51.20140818-1), bsdmainutils (= 9.0.5), bsdutils (= 1:2.20.1-5.8), build-essential (= 11.7), bzip2 (= 1.0.6-7), coreutils (= 8.21-1.2), cpp (= 4:4.9.1-3), cpp-4.9 (= 4.9.1-9), dash (= 0.5.7-4), debconf (= 1.5.53), debhelper (= 9.20140817), debianutils (= 4.4), dh-buildinfo (= 0.11), diffutils (= 1:3.3-1), dmsetup (= 2:1.02.88-1), dpkg (= 1.17.13), dpkg-dev (= 1.17.13), e2fslibs (= 1.42.11-2), e2fsprogs (= 1.42.11-2), file (= 1:5.19-1), findutils (= 4.4.2-9), g++ (= 4:4.9.1-3), g++-4.9 (= 4.9.1-9), gcc (= 4:4.9.1-3), gcc-4.9 (= 4.9.1-9), gcc-4.9-base (= 4.9.1-9), gettext (= 0.19.2-1), gettext-base (= 0.19.2-1), grep (= 2.20-2), groff-base (= 1.22.2-6), gzip (= 1.6-3), hostname (= 3.15), init (= 1.21), initscripts (= 2.88dsf-53.4), insserv (= 1.14.0-5), intltool-debian (= 0.35.0+20060710.1), libacl1 (= 2.2.52-1), libasan1 (= 4.9.1-9), libasprintf0c2 (= 0.19.2-1), libatomic1 (= 4.9.1-9), libattr1 (= 1:2.4.47-1), libaudit1 (= 1:2.3.7-1), libaudit-common (= 1:2.3.7-1), libblkid1 (= 2.20.1-5.8), libbz2-1.0 (= 1.0.6-7), libc6 (= 2.19-10), libc6-dev (= 2.19-10), libcap2 (= 1:2.24-4), libcap2-bin (= 1:2.24-4), libc-bin (= 2.19-10), libc-dev-bin (= 2.19-10), libcilkrts5 (= 4.9.1-9), libcloog-isl4 (= 0.18.2-1), libcomerr2 (= 1.42.11-2), libcroco3 (= 0.6.8-3), libcryptsetup4 (= 2:1.6.6-1), libdb5.3 (= 5.3.28-6), libdbus-1-3 (= 1.8.6-2), libdebconfclient0 (= 0.191), libdevmapper1.02.1 (= 2:1.02.88-1), libdpkg-perl (= 1.17.13), libffi6 (= 3.1-2), libgcc1 (= 1:4.9.1-9), libgcc-4.9-dev (= 4.9.1-9), libgcrypt11 (= 1.5.4-2), libgcrypt20 (= 1.6.2-2), libgdbm3 (= 1.8.3-13), libglib2.0-0 (= 2.40.0-4), libgmp10 (= 2:6.0.0+dfsg-6), libgomp1 (= 4.9.1-9), libgpg-error0 (= 1.13-3), libintl-perl (= 1.23-1), libisl10 (= 0.12.2-2), libitm1 (= 4.9.1-9), libkmod2 (= 18-1), liblzma5 (= 5.1.1alpha+20120614-2), libmagic1 (= 1:5.19-1), libmount1 (= 2.20.1-5.8), libmpc3 (= 1.0.2-1), libmpfr4 (= 3.1.2-1), libncurses5 (= 5.9+20140712-2), libncurses5-dev (= 5.9+20140712-2), libncursesw5 (= 5.9+20140712-2), libpam0g (= 1.1.8-3.1), libpam-modules (= 1.1.8-3.1), libpam-modules-bin (= 1.1.8-3.1), libpam-runtime (= 1.1.8-3.1), libpcre3 (= 1:8.35-3), libpipeline1 (= 1.3.0-1), libprocps3 (= 1:3.3.9-7), libquadmath0 (= 4.9.1-9), libselinux1 (= 2.3-1), libsemanage1 (= 2.3-1), libsemanage-common (= 2.3-1), libsepol1 (= 2.3-1), libss2 (= 1.42.11-2), libstdc++-4.9-dev (= 4.9.1-9), libstdc++6 (= 4.9.1-9), libsystemd-journal0 (= 208-8), libsystemd-login0 (= 208-8), libtext-unidecode-perl (= 0.04-2), libtimedate-perl (= 2.3000-2), libtinfo5 (= 5.9+20140712-2), libtinfo-dev (= 5.9+20140712-2), libubsan0 (= 4.9.1-9), libudev1 (= 208-8), libunistring0 (= 0.9.3-5.2), libustr-1.0-1 (= 1.0.4-3), libuuid1 (= 2.20.1-5.8), libwrap0 (= 7.6.q-25), libxml2 (= 2.9.1+dfsg1-4), libxml-libxml-perl (= 2.0116+dfsg-1+b1), libxml-namespacesupport-perl (= 1.11-1), libxml-sax-base-perl (= 1.07-1), libxml-sax-perl (= 0.99+dfsg-2), linux-libc-dev (= 3.14.15-2), login (= 1:4.2-2+b1), lsb-base (= 4.1+Debian13), make (= 4.0-8), man-db (= 2.6.7.1-1), mawk (= 1.3.3-17), mount (= 2.20.1-5.8), ncurses-base (= 5.9+20140712-2), ncurses-bin (= 5.9+20140712-2), passwd (= 1:4.2-2+b1), patch (= 2.7.1-6), perl (= 5.20.0-4), perl-base (= 5.20.0-4), perl-modules (= 5.20.0-4), po-debconf (= 1.0.16+nmu3), procps (= 1:3.3.9-7), sed (= 4.2.2-4), sensible-utils (= 0.0.9), startpar (= 0.59-3), systemd (= 208-8), systemd-sysv (= 208-8), sysvinit-utils (= 2.88dsf-53.4), sysv-rc (= 2.88dsf-53.4), tar (= 1.27.1-2), texinfo (= 5.2.0.dfsg.1-4), tzdata (= 2014f-1), ucf (= 3.0030), udev (= 208-8), util-linux (= 2.20.1-5.8), xz-utils (= 5.1.1alpha+20120614-2), zlib1g (= 1:1.2.8.dfsg-2)
.buildinfo field descriptions
- Format
- Build-Architecture
The Debian machine architecture that was used to perform the build.
- Source
- Version
- Binary
- Architecture
Same as in .changes except source should not be specified: only concrete architectures, no wildcards or any.
- Checksums-Sha256
Same format as other control files. Must list the .dsc file and all files listed in `debian/files`.
- Build-Environment
List of all packages forming the build environment, their architecture if different from build architecture, and their version. This includes Essential packages, build-essential, and Build-Depends and Build-Depends-Indep. For each packages, their dependencies should be recursively listed. The format is the same as Built-Using.
The content of the Build-Environment is close to what dh-buildinfo currently produces.
.buildinfo signatures
As .buildinfo are meant to enable to reproduce a given build, multiple parties should be able to assess their content. .buildinfo files thus are signed using detached signatures, with the full fingerprint of the key in the filename.
Example file list:
hello_2.8-1_amd64.buildinfo hello_2.8-1_amd64.buildinfo.0603CCFD91865C17E88D4C798382C95C29023DF9.asc hello_2.8-1_amd64.buildinfo.0EE5BE979282D80B9F7540F1CCD2ED94D21739E9.asc
Inclusion of .buildinfo in the archive
.buildinfo files will be referenced by their assessers through a Build-Signed-Off-By field in the Packages` index.
Example:
Package: hello Version: 2.9-1 Installed-Size: 140 Maintainer: Santiago Vila <sanvila@debian.org> Architecture: i386 […] Build-Signed-Off-By: Jérémy Bobbio <lunar@debian.org> 0603CCFD91865C17E88D4C798382C95C29023DF9, Santiago Vila <sanvila@debian.org> D54C3BFAFFB042DE382DA5D741CE7F0B9F1B8B32
Previous ideas
.changes files looked like a good place to record the environment as theylist the checksums of the build products and are signed by either the maintainer or the buildd operator.
But the meaning of .changes files is pretty clear: they describe a transactional change operation on the archive. They are not saved directly in the archive: they are equivalent of a log entry. The name of .changes file is also not specified and multiple operations can have the same name.
(See also 719854 for the first attempt which tried using XC- field in debian/control.)
Reproduce the build environment
Actions:
- We need a script that would take a list of binary packages and their respective version, installs them in a chroot and starts the build. Maybe based on pbuilder?
Ruby script that generates URL to .deb on snapshot.debian.org from a list of binary packages and their respective version: http://people.debian.org/~paulproteus/lunar-verify-script.rb
Here's another potential piece of the puzzle. The following script will convert a RFC822 date (as found in a .changes) to the URL of the last known archive state recorded by snapshot.debian.org. This might be useful to debootstrap the proper chroot before installing packages…
require 'date' require 'uri' require 'net/http' require 'nokogiri' changes_date = 'Mon, 30 Jan 2012 12:52:28 +0100' build_date = DateTime.rfc822(changes_date) url = "http://snapshot.debian.org/archive/debian/?year=#{build_date.year}&month=#{build_date.month}" response = Net::HTTP.get_response(URI.parse(url)) run = nil doc = Nokogiri::HTML(response.body) doc.css('p a').each do |link| date = DateTime.parse(link.content) break if date >= build_date run = link['href'] end puts "http://snapshot.debian.org/archive/debian/#{run}"
Note : it would probably be a lot better of adding a new query to the machine interface of snapshot.d.o instead of parsing HTML.
Identified problems, and possible solutions
Build systems tend to capture information about the environment that makes them produce different results accross different systems, despite having the same architecture and software installed.
Ideally, such variations should be fixed in the build system itself, but it might sometimes not be possible.
Files in data.tar.gz contains build paths
The build path is embedded in DWARF sections of ELF files among other types of file generated during builds. This has proven a real headache to fix after the path have been captured.
We are thus going to make mandatory to build package in a directory named like /usr/src/debian/hello-2.8-1.
As a bonus, this means that it will be easier to unpack packages in this canonical location for use with tools looking at the source code like gdb.
Files in data.tar.gz depends on readdir order
The build system needs to be patched to sort directory listings.
Epydoc
It looks like python-epydoc will produce links in an order that depends on the readdir order. This needs to be investigated.
Files in data.tar.gz varies with the locale
Builds should be made with LC_ALL=C.UTF-8.
It's quite unpractical to force such value in debian/rules and there is actually no reason this should not be the default.
Actions:
- We could make dpkg-buildpackage exports this variable; but we would need to change the policy to make dpkg-buildpackage be the canonical solution to build package.
Files in data.tar.gz contains hostname, uname output, username
Actions:
We could write a LD_PRELOAD library that could answers consistent results for several system calls on the same model as libfaketime. Bdale suggested we call it liblietome.
Files in data.tar.gz contains timestamps
- Recommended solution:
- Use the timestamp of the of the last debian/changelog entry as reference.
- touch all files to the reference timestamp before building the binary packages.
- gzip -n when gzipping anything
- get rid of non-determinisim (yup...)
- Alternate solutions:
- (or) libfaketime (probably breaks some things) (sudo apt-get install faketime)
For the worse cases, we could record the calls to gettimeofday() on the first build and have something like libfaketime replay them on rebuilds.
.a files
.a files are ar archive. GNU ar has a deterministic mode which will use zero for UIDs, GIDs, timestamps, and use consistent file modes for all files.
binutils can be built with --enable-deterministic-archives to make it the default.
jar files
See ?ReproducibleBuilds/TimestampInJarFiles.
Epydoc
python-epydoc will add timestamps to the HTML file it produces. This needs to be fixed.
javadoc
javadoc will add timestamps to the HTML file it produces. This needs to be fixed.
PHP PEAR registry files
See ?ReproducibleBuilds/TimestampInPhpRegistryFiles.
Manpages
(TODO)
PDF files
(TODO)
Random variation
dh_ocaml md5sums files
See ReproducibleBuilds/RandomOrderInOcamlMd5sums.
Members of control.tar have varying mtime
We can fix this by giving tar the --mtime= option with the date of the last debian/changelog entry or a similar fixed point in time. Change to be done in dpkg-deb/build.c:do_build() around line 462.
Lunar's branch use a single timestamp for all mtimes of tar members and allow to preset it during rebuilds, see below. Except this uncovered a new issue coming straight from tar itself.
For more consistency in tarfiles, the owner and group can be set using --owner=1000 --group=1000 --numeric-owner.
{data,control}.tar.{gz,xz,bz2} does not have timestamps
- dpkg 1.17.1 does not store a timestamp for the .gz versions of these files.
- *.xz and *.bz2 seem to provide no ability to store a timestamp.
{data,control}.tar.{gz,xz,bz2} will store files in readdir order
This is dependent on an accident of filesystem layout at build time, so it would sometimes not be reproducible.
We should probably fix this in dpkg by sorting the contents of the tar files.
Changes are discussed in 719845. Test case patch for pkg-tests. Patches that fork `sort` to get a stable order for files in control and data archives.
.deb ar-archive header contains a timestamp
.deb are ar-archives. The header currently contains the “current time”. It is written by dpkg at line line 103 of lib/dpkg/ar.c.
Guillem said he would rather keep this.
Lunar's branch use a single timestamp for all ar headers and allow to preset it during rebuilds, see below.
building the kernel
See dedicated page: SameKernel.
Custom build environment
We maintain a set of modifications to the toolchain to perform our experiments.
debhelper
The pu/reproducible_builds debhelper branch in the reproducible repository makes dh_strip call debugedit to adjust the source path embedded in debug objects.
dpkg
The pu/reproducible_builds dpkg branch in the reproducible repository makes:
- file order deterministic in control and data part of the .deb,
- uses a single timestamp for .deb mtimes and allows to preset the timestamp,
adjust dpkg-buildflags to pass -fno-merge-debug-strings in CFLAGS and CXXFLAGS
binutils
binutils has been rebuilt with the --enable-deterministic-archives flag passed to ./configure.
Usage example
$ apt-get source hello $ cd hello-2.8 $ dpkg-buildpackage […] $ cp ../hello_2.8-4_amd64.deb ../hello_2.8-4_amd64.deb.orig $ DEB_BUILD_TIMESTAMP=$(date +%s -d"$(sed -n -e 's/^Date: //p' ../hello_2.8-4_amd64.changes)") dpkg-buildpackage […] $ sha256sum ../hello_2.8-4_amd64.deb ../hello_2.8-4_amd64.deb.orig 1e944abfceac7e593f6706da971e0444e5cee9aab680de5292d52661940ee9c4 ../hello_2.8-4_amd64.deb 1e944abfceac7e593f6706da971e0444e5cee9aab680de5292d52661940ee9c4 ../hello_2.8-4_amd64.deb.orig
Success!
bash script to compare two package builds
Usage: ./diffp r1/hello_2.8-4_amd64.changes r2/hello_2.8-4_amd64
The script is available in the misc.git repository.
How to build a deb using faketime
sudo apt-get install faketime echo > /tmp/fakeroot-faketime << EOF faketime "2013-08-15T11:02:00" fakeroot "$@" EOF chmod a+x /tmp/fakeroot-faketime dpkg-buildpackage -r/tmp/fakeroot-faketime
Note that this retians *one* timestamp, which is the timestamp of the 'ar' container of the *.deb. To erase that, somehow regenerate the package within the fakeroot-faketime environment by using dpkg-deb to unpack it, then dpkg-deb to repack it.
Note also that this is a total hack and not something I (AsheeshLaroia) think it makes sense to do on the Debian build daemons. In particular, some programs (e.g., gpg) hang forever when time does not advance.
Upstream changes may solve the problems we face with faketime 0.9.1. (rbalint) Faketime upstream has been improved to advance time linearly at a preset pace per each time() call and save/load timstamps. We could try rebuilding many packages saving timestamps in the first build and replaying them in successive builds. For example gnupg 1.4.14-1 builds fine:
NO_FAKE_STAT=1 ~/projects/libfaketime.git/src/faketime -f '+0 i0.01' dpkg-buildpackage -rfakeroot -us -uc
Further work
Having reproducible builds allows us to trust binary packages better, because it becomes easier to have:
- diversity of buildd location and jurisdiction - build packages in more than one location, including the developer's
- diversity of buildd hardware, in case of hardware bugs, or malicious implants - a mix of VMs, some real hardware, different CPU manufacturers, different date of manufacture and supplier
- diversity of people - multiple signatures on a .changes file
- diversity of kernels, explained below
Kernel packages
Special features of kernel packages (including bootloaders and hypervisors) - GRUB2, Xen, linux, kfreebsd...
- we put huge trust in them - kernels are the ultimate target of any rootkit, able to completely hide from userland
- a kernel image built for amd64, if the build system is portable and reproducible enough, will be the same whether built from linux-amd64 or kfreebsd-amd64
- or maybe from different kernel versions - for example, a jessie build chroot on a wheezy host system
Then we would be better protected from something that could affect many systems at once, such as a kernel vulnerability; or widespread infection by a rootkit, which now must be compatible with more than one type of kernel to go unnoticed.
References
Mike Perry's discussion of how it took him eight weeks to make the Tor Browser Bundle have this feature: http://people.debian.org/~paulproteus/mike-perry-reproducible-tbb.txt
- Deterministic virtual machines:
"ReVirt: Enabling Intrusion Analysis through Virtual-Machine Logging and Replay" http://web.eecs.umich.edu/virtual/papers/dunlap02.pdf
"Debugging operating systems with time-traveling virtual machines" http://web.eecs.umich.edu/virtual/papers/king05_1.pdf
"A Particular Bug Trap: Execution Replay Using Virtual Machines" http://arxiv.org/pdf/cs.DC/0310030
"ReTrace: Collecting Execution Trace with Virtual Machine Deterministic Replay" http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.84.5732
"Execution Replay for Multiprocessor Virtual Machines" http://www.eecs.umich.edu/~pmchen/papers/dunlap08.slides.ppt
Fully Countering Trusting Trust through Diverse Double-Compiling (DDC) - a PhD dissertation on how to use reproducible builds to counter the "trusting trust" attack on compilers
Is that really the source code for this software? by Jos van den Oever on blogs.kde.org (2013-06-19). Compare reproducing tar from the Debian, Fedora and OpenSUSE packages.
Deterministic Builds Part One: Cyberwar and Global Compromise by Mike Perry
Deterministic Builds Part Two: Technical Details by Mike Perry
Colin Watson's answer on ubuntu-devel to “Will Ubuntu use "reproducible builds" as debian is planning to do?”
Presentations
Reproducible Builds for Debian, Distributions devroom, FOSDEM’14, Video, Slides (Sources)
Reproducible Builds, a year later, DebConf14, Slides (Sources)
Publicity
This section lists URLs, people, and dates for when other people have publicly expressed interest, or shared information about, the project.
Mike Perry, 2013-08-20: Deterministic Builds Part One: Cyberwar and Global Compromise
Jake Edge, 2013-08-21: Security software verifiability
Related projects
CARE monitors the execution of the specified command to create an archive that contains all the material required to re-execute it in the same context.