It should be possible to reproduce, byte for byte, every build of every package in Debian.

For now, we will start with a few maintainers who want to opt in to this goal as we flesh out the details of what will make it possible. This page tracks our progress.

To participate in the project, we recommend you create an account on this wiki, and then "watch" this page.

Drivers

Why do we want reproducible builds?

Others?

Status

Useful things you (yes, you!) can do

If you want to help with this, join #debian-devel and ping paulproteus (Asheesh) or tumbleweed (Stefano) or the other people listed on this page.

Use cases

Detailed package status list

Reproducing builds

There are two sides to the problem: first we need to record the initial build environment, and then we need a way to set up the same environment.

Recording the environment

The right place to record the build environment is the .changes file. Rationale: it lists the checksums of the build products and is signed by either the maintainer or the buildd operator.

To add a field to the .changes file, we need to call dpkg-buildpackage using something like:

dpkg-buildpackage --changes-option="-DBuild-Environment=$(
COLUMNS=999 | dpkg -l | awk '
            /^ii/ { ORS=", "; print $2 " (= " $3 ")" }' |
        sed -e 's/, $//'
)"

The idea is not new, see 138409. The above could eventually be integrated in dpkg proper if our experiments turn successful.

(See 719854 for the first attempt which tried using XC- field in debian/control.)

Reproduce the build environment

Actions:

Ruby script that generates URL to .deb on snapshot.debian.org from a list of binary packages and their respective version: http://people.debian.org/~paulproteus/lunar-verify-script.rb

Here's another potential piece of the puzzle. The following script will convert a RFC822 date (as found in a .changes) to the URL of the last known archive state recorded by snapshot.debian.org. This might be useful to debootstrap the proper chroot before installing packages…

require 'date'
require 'uri'
require 'net/http'
require 'nokogiri'

changes_date = 'Mon, 30 Jan 2012 12:52:28 +0100'

build_date = DateTime.rfc822(changes_date)
url = "http://snapshot.debian.org/archive/debian/?year=#{build_date.year}&month=#{build_date.month}"
response = Net::HTTP.get_response(URI.parse(url))

run = nil
doc = Nokogiri::HTML(response.body)
doc.css('p a').each do |link|
  date = DateTime.parse(link.content)
  break if date >= build_date
  run = link['href']
end
puts "http://snapshot.debian.org/archive/debian/#{run}"

Note : it would probably be a lot better of adding a new query to the machine interface of snapshot.d.o instead of parsing HTML.

Known bugs we are waiting on

Different problems, and their solutions

Build systems tend to capture information about the environment that makes them produce different results accross different systems, despite having the same architecture and software installed.

Ideally, such variations should be fixed in the build system itself, but it might sometimes not be possible.

Non-problems

Files in data.tar.gz contains build paths

These should really be patched out in one way or another. This is not useful information and can actually hide real bugs.

For debug files, use debugedit.

The build path is currently embedded in DWARF sections of ELF files. The BuildID used to locate debug files is currently generated using a SHA1 over the debug sections. This means that even after stripping debug information, the binary will differ depending on the build path. Fortunately the problem has already been identified by Fedora and debugedit has a -i option to recompute the BuildID after stripping the build path.

debugedit by itself was not enough for the hello package. The compiler also needs to be given -fno-merge-debug-strings, otherwise the .debug_str section also changes given the build path and debugedit does not fix that. See 722079 for the changes needed to make hello produce deb that do not vary with the build path.

Files in data.tar.gz depends on readdir order

The build system needs to be patched to sort directory listings.

Files in data.tar.gz varies with the locale

Builds should be made with LC_ALL=C.UTF-8.

It's quite unpractical to force such value in debian/rules and there is actually no reason this should not be the default.

Actions:

Files in data.tar.gz contains hostname, uname output, username

Actions:

Files in data.tar.gz contains timestamps

For the worse cases, we could record the calls to gettimeofday() on the first build and have something like libfaketime replay them on rebuilds.

Members of control.tar have varying mtime

We can fix this by giving tar the --mtime= option with the date of the last debian/changelog entry or a similar fixed point in time. Change to be done in dpkg-deb/build.c:do_build() around line 462.

Lunar's branch use a single timestamp for all mtimes of tar members and allow to preset it during rebuilds, see below.

{data,control}.tar.{gz,xz,bz2} does not have timestamps

{data,control}.tar.{gz,xz,bz2} will store files in readdir order

This is dependent on an accident of filesystem layout at build time, so it would sometimes not be reproducible.

We should probably fix this in dpkg by sorting the contents of the tar files.

For control.tar, we need to feed tar a sorted list of files in dpkg-deb/build.c:do_build() around line 462.

For data.tar, we need to add sort the output of find in dpkg-deb/build.c:do_build() around line 571.

Changes are discussed in 719845. Test case patch for pkg-tests.

.deb ar-archive header contains a timestamp

.deb are ar-archives. The header currently contains the “current time”. It is written by dpkg at line line 103 of lib/dpkg/ar.c.

Guillem said he would rather keep this.

Lunar's branch use a single timestamp for all ar headers and allow to preset it during rebuilds, see below.

dpkg branch handling timestamps and file order

The pu/reproducible_builds dpkg branch published by Lunar make the file order deterministic, uses a single timestamp for .deb mtimes and allows to preset the timestamp.

Usage example (after building and installing the new dpkg):

$ apt-get source hello
$ cd hello-2.8
$ dpkg-buildpackage
[…]
$ cp ../hello_2.8-4_amd64.deb ../hello_2.8-4_amd64.deb.orig
$ DEB_BUILD_TIMESTAMP=$(date +%s -d"$(sed -n -e 's/^Date: //p' ../hello_2.8-4_amd64.changes)") dpkg-buildpackage
[…]
$ sha256sum ../hello_2.8-4_amd64.deb ../hello_2.8-4_amd64.deb.orig
1e944abfceac7e593f6706da971e0444e5cee9aab680de5292d52661940ee9c4  ../hello_2.8-4_amd64.deb
1e944abfceac7e593f6706da971e0444e5cee9aab680de5292d52661940ee9c4  ../hello_2.8-4_amd64.deb.orig

Success!

How to build a deb using faketime

sudo apt-get install faketime
echo > /tmp/fakeroot-faketime << EOF
faketime "2013-08-15T11:02:00" fakeroot "$@"
EOF
chmod a+x /tmp/fakeroot-faketime
dpkg-buildpackage -r/tmp/fakeroot-faketime

Note that this retians *one* timestamp, which is the timestamp of the 'ar' container of the *.deb. To erase that, somehow regenerate the package within the fakeroot-faketime environment by using dpkg-deb to unpack it, then dpkg-deb to repack it.

Note also that this is a total hack and not something I (AsheeshLaroia) think it makes sense to do on the Debian build daemons. In particular, some programs (e.g., gpg) hang forever when time does not advance.

Upstream changes may solve the problems we face with faketime 0.9.1. (rbalint) Faketime upstream has been improved to advance time linearly at a preset pace per each time() call and save/load timstamps. We could try rebuilding many packages saving timestamps in the first build and replaying them in successive builds. For example gnupg 1.4.14-1 builds fine:

NO_FAKE_STAT=1  ~/projects/libfaketime.git/src/faketime -f '+0 i0.01' dpkg-buildpackage -rfakeroot -us -uc

References

Publicity

This section lists URLs, people, and dates for when other people have publicly expressed interest, or shared information about, the project.