It should be possible to reproduce, byte for byte, every build of every package in Debian.

For now, we will start with a few maintainers who want to opt in to this goal as we flesh out the details of what will make it possible. This page tracks our progress.

Drivers

Why do we want reproducible builds?

Others?

Status

Use cases

Detailed package status list

Reproducing builds

There are two sides to the problem: first we need to record the initial build environment, and then we need a way to set up the same environment.

Recording the environment

The right place to record the build environment is the .changes file. Rationale: it lists the checksums of the build products and is signed by either the maintainer or the buildd operator.

To add a field to the .changes file, we need to call dpkg-buildpackage using something like:

dpkg-buildpackage --changes-option="-DBuild-Environment=$(
COLUMNS=999 | dpkg -l | awk '
            /^ii/ { ORS=", "; print $2 " (= " $3 ")" }' |
        sed -e 's/, $//'
)"

The idea is not new, see 138409. The above could eventually be integrated in dpkg proper if our experiments turn successful.

(See 719854 for the first attempt which tried using XC- field in debian/control.)

Reproduce the build environment

Actions:

Ruby script that generates URL to .deb on snapshot.debian.org from a list of binary packages and their respective version: http://people.debian.org/~paulproteus/lunar-verify-script.rb

Known bugs we are waiting on

Different problems, and their solutions

Build systems tend to capture information about the environment that makes them produce different results accross different systems, despite having the same architecture and software installed.

Ideally, such variations should be fixed in the build system itself, but it might sometimes not be possible.

Non-problems

Files in data.tar.gz contains build paths

These should really be patched out in one way or another. This is not useful information and can actually hide real bugs.

For debug files, use debugedit.

Files in data.tar.gz depends on readdir order

The build system needs to be patched to sort directory listings.

Files in data.tar.gz varies with the locale

Builds should be made with LC_ALL=C.UTF-8.

It's quite unpractical to force such value in debian/rules and there is actually no reason this should not be the default.

Actions:

Files in data.tar.gz contains hostname, uname output, username

Actions:

for several system calls on the same model as libfaketime. Bdale suggested we call it liblietome.

Files in data.tar.gz contains timestamps

For the worse cases, we could record the calls to gettimeofday() on the first build and have something like libfaketime replay them on rebuilds.

Members of control.tar have varying mtime

We can fix this by giving tar the --mtime= option with the date of the last debian/changelog entry or a similar fixed point in time. Change to be done in dpkg-deb/build.c:do_build() around line 462.

{data,control}.tar.{gz,xz,bz2} may have timestamps

{data,control}.tar.{gz,xz,bz2} will store files in readdir order

This is dependent on an accident of filesystem layout at build time, so it would sometimes not be reproducible.

We should probably fix this in dpkg by sorting the contents of the tar files.

For control.tar, we need to feed tar a sorted list of files in dpkg-deb/build.c:do_build() around line 462.

For data.tar, we need to add sort the output of find in dpkg-deb/build.c:do_build() around line 571.

References