It should be possible to reproduce, byte for byte, every build of every package in Debian.

This is an on-going project. To participate, we recommend you create an account on this wiki, subscribe to this page, and join the reproducible-builds@lists.alioth.debian.org mailing list. IRC channel is #debian-reproducible on OFTC.

Alioth project

Drivers

Why do we want reproducible builds?


Status

Useful things you (yes, you!) can do

If you want to help with this, feel free to ping the mailing list or edit this wiki page.

Reported bugs

All bugs relevant to the reproducible builds project should use usertags with user reproducible-builds@lists.alioth.debian.org.

Current usertags in use:

toolchain
affects a tool used by other package build systems
infrastructure
affects the whole Debian infrastructure or policies
timestamps
time of build in recorded during the build process
fileordering
build output varies with readdir() order
buildpath
path of sources is recorded during the build process
username
username is recorded during the build process
hostname
hostname is recorded during the build process
uname
uname output is recorded during the build process
randomness
some build aspects are dependent on (pseudo-)randomness

Control commands to update the view on the BTS.

Lintian tags

Here's a list of relevant Lintian tags:

Archive wide rebuilds

UDD query to get a list of core packages (172 as of 2014-09-19):

SELECT DISTINCT source
  FROM packages
 WHERE release = 'sid'
   AND section != 'debian-installer'
   AND (   essential = 'yes'
        OR build_essential = 'yes'
        OR priority IN ('required', 'important', 'standard')
       )
 ORDER BY source;


Reproducing builds

There are two sides to the problem: first we need to record the initial build environment, and then we need a way to set up the same environment.

Recording the environment

Information on a build will be recorded in a new control file with extension .buildinfo.

.buildinfo example

The following file would be named fweb_1.62-12_i386.buildinfo:

Format: 1.9
Build-Archictecture: i386
Source: fweb
Version: 1.62-12
Binary: fweb fweb-doc
Architecture: all i386
Checksums-Sha256:
 9921500c4c6159c0019d4b8b600d2d06eef6b1da056abd2f78e66a9f0c3843b9 879 fweb_1.62-12.dsc
 3a7492c2013fbeebff08bee0514481ec0f56d2c4d138188d1ef85156d08ded00 436982 fweb-doc_1.62-12_all.deb
 a916dbb1c63707eaf52a5cdd10769871d2f621848176dc8f7ab4f0dcd999af85 229990 fweb_1.62-12_i386.deb
Build-Environment:
 acl (= 2.2.52-1),
 adduser (= 3.113+nmu3),
 base-files (= 7.5),
 base-passwd (= 3.5.33),
 bash (= 4.3-9),
 binutils (= 2.24.51.20140818-1),
 bsdmainutils (= 9.0.5),
 bsdutils (= 1:2.20.1-5.8),
 build-essential (= 11.7),
 bzip2 (= 1.0.6-7),
 coreutils (= 8.21-1.2),
 cpp (= 4:4.9.1-3),
 cpp-4.9 (= 4.9.1-9),
 dash (= 0.5.7-4),
 debconf (= 1.5.53),
 debhelper (= 9.20140817),
 debianutils (= 4.4),
 dh-buildinfo (= 0.11),
 diffutils (= 1:3.3-1),
 dmsetup (= 2:1.02.88-1),
 dpkg (= 1.17.13),
 dpkg-dev (= 1.17.13),
 e2fslibs (= 1.42.11-2),
 e2fsprogs (= 1.42.11-2),
 file (= 1:5.19-1),
 findutils (= 4.4.2-9),
 g++ (= 4:4.9.1-3),
 g++-4.9 (= 4.9.1-9),
 gcc (= 4:4.9.1-3),
 gcc-4.9 (= 4.9.1-9),
 gcc-4.9-base (= 4.9.1-9),
 gettext (= 0.19.2-1),
 gettext-base (= 0.19.2-1),
 grep (= 2.20-2),
 groff-base (= 1.22.2-6),
 gzip (= 1.6-3),
 hostname (= 3.15),
 init (= 1.21),
 initscripts (= 2.88dsf-53.4),
 insserv (= 1.14.0-5),
 intltool-debian (= 0.35.0+20060710.1),
 libacl1 (= 2.2.52-1),
 libasan1 (= 4.9.1-9),
 libasprintf0c2 (= 0.19.2-1),
 libatomic1 (= 4.9.1-9),
 libattr1 (= 1:2.4.47-1),
 libaudit1 (= 1:2.3.7-1),
 libaudit-common (= 1:2.3.7-1),
 libblkid1 (= 2.20.1-5.8),
 libbz2-1.0 (= 1.0.6-7),
 libc6 (= 2.19-10),
 libc6-dev (= 2.19-10),
 libcap2 (= 1:2.24-4),
 libcap2-bin (= 1:2.24-4),
 libc-bin (= 2.19-10),
 libc-dev-bin (= 2.19-10),
 libcilkrts5 (= 4.9.1-9),
 libcloog-isl4 (= 0.18.2-1),
 libcomerr2 (= 1.42.11-2),
 libcroco3 (= 0.6.8-3),
 libcryptsetup4 (= 2:1.6.6-1),
 libdb5.3 (= 5.3.28-6),
 libdbus-1-3 (= 1.8.6-2),
 libdebconfclient0 (= 0.191),
 libdevmapper1.02.1 (= 2:1.02.88-1),
 libdpkg-perl (= 1.17.13),
 libffi6 (= 3.1-2),
 libgcc1 (= 1:4.9.1-9),
 libgcc-4.9-dev (= 4.9.1-9),
 libgcrypt11 (= 1.5.4-2),
 libgcrypt20 (= 1.6.2-2),
 libgdbm3 (= 1.8.3-13),
 libglib2.0-0 (= 2.40.0-4),
 libgmp10 (= 2:6.0.0+dfsg-6),
 libgomp1 (= 4.9.1-9),
 libgpg-error0 (= 1.13-3),
 libintl-perl (= 1.23-1),
 libisl10 (= 0.12.2-2),
 libitm1 (= 4.9.1-9),
 libkmod2 (= 18-1),
 liblzma5 (= 5.1.1alpha+20120614-2),
 libmagic1 (= 1:5.19-1),
 libmount1 (= 2.20.1-5.8),
 libmpc3 (= 1.0.2-1),
 libmpfr4 (= 3.1.2-1),
 libncurses5 (= 5.9+20140712-2),
 libncurses5-dev (= 5.9+20140712-2),
 libncursesw5 (= 5.9+20140712-2),
 libpam0g (= 1.1.8-3.1),
 libpam-modules (= 1.1.8-3.1),
 libpam-modules-bin (= 1.1.8-3.1),
 libpam-runtime (= 1.1.8-3.1),
 libpcre3 (= 1:8.35-3),
 libpipeline1 (= 1.3.0-1),
 libprocps3 (= 1:3.3.9-7),
 libquadmath0 (= 4.9.1-9),
 libselinux1 (= 2.3-1),
 libsemanage1 (= 2.3-1),
 libsemanage-common (= 2.3-1),
 libsepol1 (= 2.3-1),
 libss2 (= 1.42.11-2),
 libstdc++-4.9-dev (= 4.9.1-9),
 libstdc++6 (= 4.9.1-9),
 libsystemd-journal0 (= 208-8),
 libsystemd-login0 (= 208-8),
 libtext-unidecode-perl (= 0.04-2),
 libtimedate-perl (= 2.3000-2),
 libtinfo5 (= 5.9+20140712-2),
 libtinfo-dev (= 5.9+20140712-2),
 libubsan0 (= 4.9.1-9),
 libudev1 (= 208-8),
 libunistring0 (= 0.9.3-5.2),
 libustr-1.0-1 (= 1.0.4-3),
 libuuid1 (= 2.20.1-5.8),
 libwrap0 (= 7.6.q-25),
 libxml2 (= 2.9.1+dfsg1-4),
 libxml-libxml-perl (= 2.0116+dfsg-1+b1),
 libxml-namespacesupport-perl (= 1.11-1),
 libxml-sax-base-perl (= 1.07-1),
 libxml-sax-perl (= 0.99+dfsg-2),
 linux-libc-dev (= 3.14.15-2),
 login (= 1:4.2-2+b1),
 lsb-base (= 4.1+Debian13),
 make (= 4.0-8),
 man-db (= 2.6.7.1-1),
 mawk (= 1.3.3-17),
 mount (= 2.20.1-5.8),
 ncurses-base (= 5.9+20140712-2),
 ncurses-bin (= 5.9+20140712-2),
 passwd (= 1:4.2-2+b1),
 patch (= 2.7.1-6),
 perl (= 5.20.0-4),
 perl-base (= 5.20.0-4),
 perl-modules (= 5.20.0-4),
 po-debconf (= 1.0.16+nmu3),
 procps (= 1:3.3.9-7),
 sed (= 4.2.2-4),
 sensible-utils (= 0.0.9),
 startpar (= 0.59-3),
 systemd (= 208-8),
 systemd-sysv (= 208-8),
 sysvinit-utils (= 2.88dsf-53.4),
 sysv-rc (= 2.88dsf-53.4),
 tar (= 1.27.1-2),
 texinfo (= 5.2.0.dfsg.1-4),
 tzdata (= 2014f-1),
 ucf (= 3.0030),
 udev (= 208-8),
 util-linux (= 2.20.1-5.8),
 xz-utils (= 5.1.1alpha+20120614-2),
 zlib1g (= 1:1.2.8.dfsg-2)

.buildinfo field descriptions

Format

Same as in .changes.

Build-Architecture

The Debian machine architecture that was used to perform the build.

Source

Same as in .changes.

Version

Same as in .changes.

Binary

Same as in .changes.

Architecture

Same as in .changes except source should not be specified: only concrete architectures, no wildcards or any.

Checksums-Sha256

Same format as other control files. Must list the .dsc file and all files listed in `debian/files`.

Build-Environment

List of all packages forming the build environment, their architecture if different from build architecture, and their version. This includes Essential packages, build-essential, and Build-Depends and Build-Depends-Indep. For each packages, their dependencies should be recursively listed. The format is the same as Built-Using.

The content of the Build-Environment is close to what dh-buildinfo currently produces.

.buildinfo signatures

As .buildinfo are meant to enable to reproduce a given build, multiple parties should be able to assess their content. .buildinfo files thus are signed using detached signatures, with the full fingerprint of the key in the filename.

Example file list:

hello_2.8-1_amd64.buildinfo
hello_2.8-1_amd64.buildinfo.0603CCFD91865C17E88D4C798382C95C29023DF9.asc
hello_2.8-1_amd64.buildinfo.0EE5BE979282D80B9F7540F1CCD2ED94D21739E9.asc

Inclusion of .buildinfo in the archive

.buildinfo files will be referenced by their assessers through a Build-Signed-Off-By field in the Packages` index.

Example:

Package: hello
Version: 2.9-1
Installed-Size: 140
Maintainer: Santiago Vila <sanvila@debian.org>
Architecture: i386
[…]
Build-Signed-Off-By:
 0603CCFD91865C17E88D4C798382C95C29023DF9 Jérémy Bobbio <lunar@debian.org>,
 D54C3BFAFFB042DE382DA5D741CE7F0B9F1B8B32 Santiago Vila <sanvila@debian.org> 

Previous ideas

.changes files looked like a good place to record the environment as theylist the checksums of the build products and are signed by either the maintainer or the buildd operator.

But the meaning of .changes files is pretty clear: they describe a transactional change operation on the archive. They are not saved directly in the archive: they are equivalent of a log entry. The name of .changes file is also not specified and multiple operations can have the same name.

(See also 719854 for the first attempt which tried using XC- field in debian/control.)

Reproduce the build environment

Actions:

Ruby script that generates URL to .deb on snapshot.debian.org from a list of binary packages and their respective version: http://people.debian.org/~paulproteus/lunar-verify-script.rb

Here's another potential piece of the puzzle. The following script will convert a RFC822 date (as found in a .changes) to the URL of the last known archive state recorded by snapshot.debian.org. This might be useful to debootstrap the proper chroot before installing packages…

require 'date'
require 'uri'
require 'net/http'
require 'nokogiri'

changes_date = 'Mon, 30 Jan 2012 12:52:28 +0100'

build_date = DateTime.rfc822(changes_date)
url = "http://snapshot.debian.org/archive/debian/?year=#{build_date.year}&month=#{build_date.month}"
response = Net::HTTP.get_response(URI.parse(url))

run = nil
doc = Nokogiri::HTML(response.body)
doc.css('p a').each do |link|
  date = DateTime.parse(link.content)
  break if date >= build_date
  run = link['href']
end
puts "http://snapshot.debian.org/archive/debian/#{run}"

Note : it would probably be a lot better of adding a new query to the machine interface of snapshot.d.o instead of parsing HTML.


Identified problems, and possible solutions

Build systems tend to capture information about the environment that makes them produce different results accross different systems, despite having the same architecture and software installed.

Ideally, such variations should be fixed in the build system itself, but it might sometimes not be possible.

Files in data.tar.gz contains build paths

The build path is embedded in DWARF sections of ELF files among other types of file generated during builds. This has proven a real headache to fix after the path have been captured.

We are thus going to make mandatory to build package in a directory named like /usr/src/debian/hello-2.8-1.

As a bonus, this means that it will be easier to unpack packages in this canonical location for use with tools looking at the source code like gdb.

Files in data.tar.gz depends on readdir order

The build system needs to be patched to sort directory listings.

Epydoc

It looks like python-epydoc will produce links in an order that depends on the readdir order. This needs to be investigated.

Files in data.tar.gz varies with the locale

Builds should be made with LC_ALL=C.UTF-8.

It's quite unpractical to force such value in debian/rules and there is actually no reason this should not be the default.

Actions:

Files in data.tar.gz contains hostname, uname output, username

We could write a LD_PRELOAD library that could answers consistent results for several system calls on the same model as libfaketime. Bdale suggested we call it liblietome.

But we can also consider that no build systems should capture or produce different builds depending on such information and fix them.

Files in data.tar contains timestamps

Recommended solutions in order of preference:

.a files

.a files are ar archive. GNU ar has a deterministic mode which will use zero for UIDs, GIDs, timestamps, and use consistent file modes for all files.

strip-nondeterminism will normalize .a files the same way GNU ar does when run in deterministic mode.

jar files

See ?ReproducibleBuilds/TimestampInJarFiles.

strip-nondeterminism will normalize jar files.

Epydoc

python-epydoc will add timestamps to the HTML file it produces. This needs to be fixed.

javadoc

javadoc will add timestamps to the HTML file it produces.

strip-nondeterminism will normalize jar files.

PHP PEAR registry files

See ?ReproducibleBuilds/TimestampInPhpRegistryFiles.

HTML documentation generated by Doxygen

If Doxyfile contains HTML_TIMESTAMP = YES, Doxygen will add a timestamp to its generated documentation.

A good amount of potentially affected packages can be found using codesearch.

Members of control.tar and data.tar have varying mtimes

dpkg-deb will record the mtime of files it packs in control.tar and data.tar. This is bad as most of these files are generated during the build process and will thus change with each build.

759886 contains a patch against debhelper that adds a new dh_fixmtimes helper that will ensure that the mtime of any file created after the date of latest changelog entry will be set to the date of the latest changelog entry.

{data,control}.tar.{gz,xz,bz2} will store files in readdir order

This is dependent on an accident of filesystem layout at build time, so it would sometimes not be reproducible.

We should probably fix this in dpkg by sorting the contents of the tar files.

Changes are discussed in 719845. Test case patch for pkg-tests. Patches that fork `sort` to get a stable order for files in control and data archives.

.deb ar-archive header contains a timestamp

.deb are ar-archives. The header currently contains the “current time”.

759999 contains patches against dpkg that will preset the timestamp to the time of the latest entry of debian/changelog when a package is built using dpkg-buildpackage.

building the kernel

See dedicated page: SameKernel.


Custom build environment

We maintain a set of modifications to the toolchain to perform our experiments. Commit notifications are sent to a dedicated mailing list.

debhelper

The pu/reproducible_builds debhelper branch in the reproducible project contains:

dpkg

The pu/reproducible_builds dpkg branch in the reproducible repository makes:

  1. file order deterministic in control and data part of the .deb,
  2. uses a single timestamp for .deb ar members
  3. preset the aforementioned timestamp to the latest changelog entry

strip-nondeterminism

strip-nondeterminism is a post-processing tool that will normalize various file types. dh_strip_nondeterminism will be run by debhelper at the end of the build process.

dh-python

dh-python needs a patch for stable ordering of control variables. See 759231 and the pu/reproducible_builds branch.

discount

discount needs a patch to produce stable output of email addresses. See 762622 and the pu/reproducible_builds branch.

Usage example

If you have a pbuilder already setup, it's fairly easy to setup an environment with the custom toolchain:

sudo cp /var/cache/pbuilder/base.tgz /var/cache/pbuilder/base-reproducible.tgz
sudo pbuilder --login --save-after-exec --basetgz /var/cache/pbuilder/base-reproducible.tgz
echo 'deb http://reproducible.alioth.debian.org/debian/ ./' > /etc/apt/sources.list.d/reproducible.list
apt-get update
apt-get install dpkg dpkg-dev debhelper dh-python discount
exit 0

Once that's done, rebuilding a package can be done through:

apt-get source --download-only acl
sudo pbuilder --build --basetgz /var/cache/pbuilder/base-reproducible.tgz acl_*.dsc
mkdir b1 b2
dcmd cp /var/cache/pbuilder/result/acl_*.changes b1
sudo dcmd rm /var/cache/pbuilder/result/acl_*.changes
sudo pbuilder --build --basetgz /var/cache/pbuilder/base-reproducible.tgz acl_*.dsc
dcmd cp /var/cache/pbuilder/result/acl_*.changes b2
sudo dcmd rm /var/cache/pbuilder/result/acl_*.changes

diffp is useful to check the result:

../misc/diffp b1/*.changes b2/*.changes


bash script to compare two package builds

Usage: ./diffp r1/hello_2.8-4_amd64.changes r2/hello_2.8-4_amd64

The script is available in the misc.git repository.


How to build a deb using faketime

sudo apt-get install faketime
echo > /tmp/fakeroot-faketime << EOF
faketime "2013-08-15T11:02:00" fakeroot "$@"
EOF
chmod a+x /tmp/fakeroot-faketime
dpkg-buildpackage -r/tmp/fakeroot-faketime

Note that this retians *one* timestamp, which is the timestamp of the 'ar' container of the *.deb. To erase that, somehow regenerate the package within the fakeroot-faketime environment by using dpkg-deb to unpack it, then dpkg-deb to repack it.

Note also that this is a total hack and not something I (AsheeshLaroia) think it makes sense to do on the Debian build daemons. In particular, some programs (e.g., gpg) hang forever when time does not advance.

Upstream changes may solve the problems we face with faketime 0.9.1. (rbalint) Faketime upstream has been improved to advance time linearly at a preset pace per each time() call and save/load timstamps. We could try rebuilding many packages saving timestamps in the first build and replaying them in successive builds. For example gnupg 1.4.14-1 builds fine:

NO_FAKE_STAT=1  ~/projects/libfaketime.git/src/faketime -f '+0 i0.01' dpkg-buildpackage -rfakeroot -us -uc


Further work

Having reproducible builds allows us to trust binary packages better, because it becomes easier to have:

Kernel packages

Special features of kernel packages (including bootloaders and hypervisors) - GRUB2, Xen, linux, kfreebsd...

Then we would be better protected from something that could affect many systems at once, such as a kernel vulnerability; or widespread infection by a rootkit, which now must be compatible with more than one type of kernel to go unnoticed.


References

Presentations

Publicity

This section lists URLs, people, and dates for when other people have publicly expressed interest, or shared information about, the project.