Differences between revisions 304 and 305
Revision 304 as of 2015-01-05 14:39:32
Size: 31470
Editor: PaulWise
Comment:
Revision 305 as of 2015-01-05 15:40:31
Size: 1106
Editor: Lunar
Comment: split into multiple subpages
Deletions are marked like this. Additions are marked like this.
Line 3: Line 3:
{{{#!wiki note
This is an on-going project. To '''participate''', we recommend you create an account on this wiki, [[https://wiki.debian.org/ReproducibleBuilds?action=subscribe|subscribe]] to this page, and join the [[https://lists.alioth.debian.org/mailman/listinfo/reproducible-builds|reproducible-builds@lists.alioth.debian.org mailing list]]. IRC channel is `#debian-reproducible` on OFTC.
}}}

<<TableOfContents()>>

Other resources:
 * [[ReproducibleBuilds/About|About]]
 * [[ReproducibleBuilds/Contribute|How to help!]]
 * [[ReproducibleBuilds/ExperimentalToolchain|Experimental build environment]]
 * [[ReproducibleBuilds/Howto|How to make a package build reproducibly]]
 * [[ReproducibleBuilds/History|History of reproducible builds in Debian]]
Line 20: Line 17:
= Why do we want reproducible builds? =

 * Allow independent verifications that a binary matches what the source intended to produce.
 * Help `Multi-Arch: same` packages co-installation (as they need every matching file to be byte identical).
 * Be able to generate debug symbols for packages which do not have a “debug package”.
 * Ensure packages can be built from source. The archive could be made to only accept reproducible uploads: the maintainer would stop uploading .deb files but keep them referenced in the .changes. A buildd would then build the source. Only if the hash matches the upload gets accepted.
 * Allow file-level deduplication on Debian mirror sites, or maybe snapshots.d.o, of .deb files whose contents didn't really change between versions.
 * Allow .deb deltas to be smaller.
 * Packages with [[BuildProfileSpec|build profiles]] must offer the exact same functionality for all profiles. Reproducible builds could be use to verify that it is the case.
 * Making sure that Architecture:all packages are build identically on different build architectures.
 * Validate [[ReleaseGoals/CrossBuildableBase|cross-builds]] against native builds.
 * [[http://people.debian.org/~lunar/blog/posts/reproducible_builds_against_rc_bugs/|Find release critical bugs]].

--------
Line 40: Line 22:

= Useful things you (yes, you!) can do =

 * If you maintain a package for debian, you can make sure that your package uses a modern debhelper style (e.g. one-liner `debian/rules` with overrides as needed). We aim to fix many causes of non-deterministic builds in the debhelper suite directly, so packages that use debhelper will be much easier to make reproducible with just an upgrade of the toolchain.
 * Look at the last 24h of results from [[https://reproducible.debian.net/userContent/index_last_24h.html|Jenkins reproducible jobs]], pick a package, look at the `debbindiff` output and investigate.
 * Find a way to prevent javadoc from writing timestamps.
 * Find a way to prevent Epydoc from writing timestamps and output links in filesystem order.
 * Find a way to get reproducible PE binaries.
 * Create a patch for DebianPts:pbuilder to build packages in `/usr/src/debian/hello-2.8-1` instead of `/tmp/buildd`.
 * Research about other distributions: NixOS, SUSE (see [[https://build.opensuse.org/package/show/openSUSE:Factory/build-compare|build-compare]]), then write about it on your blog and link to it on this wiki page.
 * make the `perl` package build entirely without calling perl at all

If you want to help with this, feel free to ping the mailing list or edit this wiki page.

= Reported bugs =

{{{#!wiki note
[[http://bugs.debian.org/cgi-bin/pkgreport.cgi?usertag=reproducible-builds@lists.alioth.debian.org|Overview of all bug reports concerning reproducible builds]]
}}}

All bugs relevant to the reproducible builds project should use [[bugs.debian.org/usertags|usertags]] with user `reproducible-builds@lists.alioth.debian.org`.

Current usertags in use:

 toolchain:: affects a tool used by other package build systems
 infrastructure:: affects the whole Debian infrastructure or policies
 timestamps:: time of build in recorded during the build process
 fileordering:: build output varies with readdir() order
 buildpath:: path of sources is recorded during the build process
 username:: username is recorded during the build process
 hostname:: hostname is recorded during the build process
 uname:: uname output is recorded during the build process
 randomness:: some build aspects are dependent on (pseudo-)randomness
 cpu:: some build aspects are dependent on CPU features or computation speed
 buildinfo:: issues related to .buildinfo control files

''[[/UserCategory|Control commands to update the view on the BTS]].''

= Lintian tags =

Here's a list of relevant Lintian tags:

 * [[https://lintian.debian.org/tags/package-contains-timestamped-gzip.html|package-contains-timestamped-gzip]]

= Archive wide rebuilds =

 * [[ReproducibleBuilds/Rebuild20130907|2013-09-07]] by David Suárez. 24% of 5240 source packages reproducible. Variations: time, build path.
 * [[ReproducibleBuilds/Rebuild20140126|2014-01-26]] by David Suárez. 67% of 6887 source packages reproducible. Variations: time, build path.
 * [[ReproducibleBuilds/RebuildCore20140919|2014-09-19]] by Lunar, 30% of 172 source core packages reproducible. Variations: time, file order.
 * [[https://reproducible.debian.net/userContent/reproducible.html|Updated daily since 2014-09-28]] by jenkins.debian.net. On 2014-11-11, 13213 (61.4%) out of 21448 packages are reproducible.

[[UltimateDebianDatabase|UDD]] query to get a list of core packages (172 as of 2014-09-19):
{{{
SELECT DISTINCT source
  FROM packages
 WHERE release = 'sid'
   AND section != 'debian-installer'
   AND ( essential = 'yes'
        OR build_essential = 'yes'
        OR priority IN ('required', 'important', 'standard')
       )
 ORDER BY source;
}}}

--------
= Reproducing builds =

There are two sides to the problem: first we need to record the initial build
environment, and then we need a way to set up the same environment.

== Recording the environment ==

Information on a build will be recorded in a [[ReproducibleBuilds/BuildinfoSpecification|new control file with extension `.buildinfo`]].

== Reproduce the build environment ==

The srebuild program is a sbuild wrapper which finds a timestamp from
snapshot.debian.org containing all versions of the binary packages in a
`.buildinfo` file and then carries out the build with the right versions
installed.

See [[https://lists.alioth.debian.org/pipermail/reproducible-builds/Week-of-Mon-20141229/000613.html|srebuild]].

--------
= Reproducible builds automated on jenkins.d.n =

Several jobs have been created to regularily test packages (from sid main) on jenkins.d.n. As a result there is the [[https://reproducible.debian.net/userContent/reproducible.html|reproducible build overview of packages]], which eventually will have results for all >21k sources packages in Debian.

The setup is explained in [[http://layer-acht.org/thinking/blog/20140925-reproducible-builds/|this blog post]] only, but this post is somewhat outdated by now and needs to be amended.
--------
= The basics for making packages build reproducible =

Currently a plain sid environement is not enough to build packages reproducibly easily. Instead a few packages needs to be taken from the reproducible apt repository as explained above (at least {{{debhelper >= 9.20141004~reproducible1}}} and {{{dpkg >= 1.17.17~reproducible1}}} are needed). Besides this, these are the bascis for different types of packaging:

 1. use {{{dh}}} with {{{compat=9}}}
 1. use {{{debhelper}}} version 7 style packaging, add {{{strip-nondeterminism}}} to build-depends and make these modifications to debian/rules:
  1. add {{{dh_strip_nondeterminism}}} before {{{dh_compress}}}
  1. add {{{dh_fixmtimes}}} after {{{dh_md5sums}}}, right before {{{dh_builddeb}}}
  1. add {{{dh_genbuildinfo}}} at the end of the dh_ sequence, so after {{{dh_builddeb}}}
 1. use {{{cdbs}}} (needs {{{cdbs >= 0.4.127~reproducible1}}})

Other types of packaging should be avoided and really be converted to {{{dh}}}.

With this, the basics should be covered and simple packages should build reproducible. See the next chapter for a discussion of common reproducibility issues and their solutions.

--------
= Identified problems, and possible solutions =

Build systems tend to capture information about the environment that
makes them produce different results accross different systems, despite
having the same architecture and software installed.

Ideally, such variations should be fixed in the build system itself, but
it might sometimes not be possible.

== Files in data.tar.gz contains build paths ==

The build path is embedded in DWARF sections of ELF files among other
types of file generated during builds. This has proven a real headache
to fix after the path have been captured.

We are thus going to make mandatory to build package in a directory named
like `/usr/src/debian/hello-2.8-1`.

As a bonus, this means that it will be easier to unpack packages in this
canonical location for use with tools looking at the source code like `gdb`.

== Files in data.tar.gz depends on readdir order ==

The build system needs to be patched to sort directory listings.

=== Epydoc ===

It looks like DebianPts:python-epydoc will produce links in an order that depends on the readdir order. ''This needs to be investigated.''

== Files in data.tar.gz varies with the locale ==

Builds should be made with `LC_ALL=C.UTF-8`.

It's quite unpractical to force such value in `debian/rules` and there is
actually no reason this should not be the default.

'''Actions:'''

 * We could make dpkg-buildpackage exports this variable; but we would need to change the policy to make dpkg-buildpackage be the canonical solution to build package.

== Files in data.tar.gz contains hostname, uname output, username ==

We could write a `LD_PRELOAD` library that could answers consistent results for several system calls on the same model as `libfaketime`. Bdale suggested we call it `liblietome`.

But we can also consider that no build systems should capture or produce
different builds depending on such information and fix them.

== Files in data.tar contains timestamps ==

Recommended solutions in order of preference:

 * Prevent the timestamp from being written entirely in the build products.
 * Tell the tools to use the timestamp of “0” if the timestamp is not used.
 * Tell the tools to use the timestamp of the last `debian/changelog` entry.
 * Strip timestamps at the end of the build process.
 * Replace timestamps at the end of the build process.

Specific issues:

 * [[ReproducibleBuilds/TimestampFromCPPMacros|Timestamps from C pre-processor macros]]
 * [[ReproducibleBuilds/TimestampInPhpRegistryFiles|Timestamps in PHP registry files]]
 * [[ReproducibleBuilds/TimestampInGeneratedDocumentation|Timestamp in generated documentation]]
 * [[ReproducibleBuilds/TimestampInLongLinkTarMembers|“Long link” Tar members contain a timestamp]]
 * [[ReproducibleBuilds/TimestampInStaticLibraries|Timestamps in static libraries (.a files)]]
 * [[ReproducibleBuilds/TimestampInDocumentationGeneratedByEpydoc|Timestamps in documentation generated by Epydoc]]
 * [[ReproducibleBuilds/TimestampInDocumentationGeneratedByJavadoc|Timestamps in documentation generated by Javadoc]]
 * [[ReproducibleBuilds/TimestampInDocumentationGeneratedByDoxygen|Timestamps in documentation generated by Doxygen]]
 * [[ReproducibleBuilds/TimestampInPEBinaries|Timestamps in PE binaries]]
 * [[ReproducibleBuilds/TimestampInDviGeneratedByLaTeX|Timestamps in .dvi files generated by LaTeX]]
 * [[ReproducibleBuilds/TimestampInPDFGeneratedByLaTeX|Timestamps in .pdf files generated by LaTeX]]
 * [[ReproducibleBuilds/TimestampInMavenPomProperties|Timestamps in pom.properties files generated by Maven]]
 * [[ReproducibleBuilds/RStatisticalComputingPackages|Timestamps in R files]]
 * [[ReproducibleBuilds/TimestampInDatabaseGeneratedByQhelpgenerator|Timestamps in database generated by qhelpgenerator]]

Now fixed:

 * [[ReproducibleBuilds/TimestampInGhcInterfaces|Timestamps in ghc --show-ifaces]]
 * [[ReproducibleBuilds/TimestampInGzipHeaders|Timestamp in gzip headers]]
 * [[ReproducibleBuilds/TimestampInJarFiles|Timestamp in jar files]]

== Symlinks in data.tar contain varying file mode ==

POSIX says that the file mode on symlinks is undefined, so this ends up being system dependent behavior. On Linux the umask is ignored when creating symlinks, but on other systems such as kFreeBSD or Hurd the umask is honored, which can produce varying file modes. This is at least a problem for architecture independent packages which are built on different operating systems.

A way to always get the same file mode for symlinks is to set umask to 0 before creating them, and restore the previous umask afterwards, but this might be unfeasible in general.

== Generation of files in data.tar depends on (pseudo-)randomness ==

Now fixed:

 * [[ReproducibleBuilds/RandomOrderInOcamlMd5sums|Random order in OCaml md5sums files]]

== Members of control.tar and data.tar have varying mtimes ==

`dpkg-deb` will record the mtime of files it packs in `control.tar` and `data.tar`. This is bad as most of these files are generated during the build process and will thus change with each build.

Bug:759886 contains a patch against DebianPts:debhelper that adds a new `dh_fixmtimes` helper that will ensure that the mtime of any file created after the date of latest changelog entry will be set to the date of the latest changelog entry.

== {data,control}.tar.{gz,xz,bz2} will store files in readdir order ==

This is dependent on an accident of filesystem layout at build time, so it would
sometimes not be reproducible.

We should probably fix this in dpkg by sorting the contents of the tar files.

Changes are discussed in Bug:719845. [[http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=39;att=0;bug=719845|Test case patch for pkg-tests]]. [[http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=719845#54|Patches that fork `sort` to get a stable order for files in control and data archives]].

== Files generated by debhelper depend on readdir order ==

At least the shlibs files generated by dh_makeshlibs depends on the order returned by find(1). There might be other debhelper programs with the same issue.

== Randomness in control file ==

Now fixed:

 * [[ReproducibleBuilds/RandomOrderInDependsForPython3|Random order for python3 Depends]]

== .deb ar-archive header contains a timestamp ==

.deb are ar-archives. The header currently contains the “current time”.

Bug:759999 contains patches against DebianPts:dpkg that will preset the timestamp to the time of the latest entry of `debian/changelog` when a package is built using `dpkg-buildpackage`.

== building the kernel ==

Uncoordinated experiments, see [[SameKernel]].

== XSLT generate-id() is non-deterministic ==

XSLT's generate-id() function is explicitly allowed by the XSLT spec to be non-deterministic, and is frequently implemented using memory addresses of XML nodes, which are of course non-deterministic thanks to ASLR. Consequentially, files that are generated by XSLT (typically documentation) that include the result of generate-id() in their output do not build deterministically.

piuparts, which uses xmlto to generate documentation, is affected by this.

== PNG files contain timestamps ==

If the PNG is being generated by ImageMagick, some options can be passed to `convert` to inhibit inclusion of the timestamps. Please note that the options have to be passed just before the output filename and `composite` only supports the time chunk removal, for example:

{{{
convert -scale 32x32 input.png +set date:create +set date:modify -define png:exclude-chunk=time output.png
composite -compose CopyOpacity mask.png input.png -define png:exclude-chunk=time output.png
}}}

--------

= Custom build environment =

We maintain a set of modifications to the toolchain to perform our
experiments. Commit notifications are sent to a [[https://lists.alioth.debian.org/mailman/listinfo/reproducible-commits|dedicated mailing list]].

Our modified packages can be found in the following APT archive, which is signed by [[https://reproducible.alioth.debian.org/reproducible.gpg|49B6 5747 36D0 B637 CC37 01EA 5DB7 CA67 EA59 A31F]]:

{{{
deb http://reproducible.alioth.debian.org/debian/ ./
deb-src http://reproducible.alioth.debian.org/debian/ ./
}}}

== debhelper ==

The [[http://anonscm.debian.org/gitweb/?p=reproducible/debhelper.git;a=shortlog;h=refs/heads/pu/reproducible_builds|pu/reproducible_builds]] debhelper branch in the `reproducible` project contains several fixes and calls `dh_strip_nondeterminism` (see below) will be called before `dh_compress` in `dh`. See the [[http://anonscm.debian.org/cgit/reproducible/debhelper.git/tree/debian/changelog?h=pu/reproducible_builds|changelog]] for details.

== dpkg ==

The [[http://anonscm.debian.org/gitweb/?p=reproducible/dpkg.git;a=shortlog;h=refs/heads/pu/reproducible_builds|pu/reproducible_builds]] dpkg branch in the `reproducible` repository makes:

 1. file order deterministic in control and data part of the .deb,
 2. uses a single timestamp for .deb ar members
 3. preset the aforementioned timestamp to the latest changelog entry
 4. add `-Wdate-time` as part of `CPPFLAGS` in `dpkg-buildflags`
 5. add support for .buildinfo files

== strip-nondeterminism ==

[[https://anonscm.debian.org/cgit/reproducible/strip-nondeterminism.git/|strip-nondeterminism]] is a post-processing tool that will normalize various file types. `dh_strip_nondeterminism` will be run by `debhelper` at the end of
the build process.

== cdbs ==

The [[http://anonscm.debian.org/gitweb/?p=reproducible/cdbs.git;a=shortlog;h=refs/heads/pu/reproducible_builds|pu/reproducible_builds]] cdbs branch in the `reproducible` project contains a fix for Bug:764478 which makes cdbs call the newly introduced dh_strip_nondeterminism commands.

== javatools ==

The [[http://anonscm.debian.org/gitweb/?p=reproducible/javatools.git;a=shortlog;h=refs/heads/pu/reproducible_builds|pu/reproducible_builds]] javatools branch in the `reproducible` repository changes `javahelper` to call `jh_installlibs` after `dh_install`. See Bug:764988.

== discount ==

DebianPts:discount needs a patch to produce stable output of email addresses. See Bug:762622 and the [[https://anonscm.debian.org/cgit/reproducible/discount.git/log/?h=pu/reproducible_builds|pu/reproducible_builds branch]].

== ghc ==
The [[http://anonscm.debian.org/cgit/reproducible/ghc.git/log/?h=pu/reproducible_builds|pu/reproducible_builds]] ghc branch in the `reproducible` repository changes `ghc` to not include timestamps in interface hashes. See Bug:769893

== python-setuptools ==

DebianPts:python-setuptools needs a [[https://bugs.debian.org/cgi-bin/bugreport.cgi?msg=5;filename=python-setuptools_5.5.1-1_reproducible.diff;att=1;bug=773969|patch]] to write names `top_level.txt` in a stable order. See Bug:773969

== r-base ==

DebianPts:r-base needs a [[https://bugs.debian.org/cgi-bin/bugreport.cgi?msg=5;filename=r-base_3.1.2-2_reproducible0.diff;att=1;bug=774031|patch]] to remove build time and username written while building packages. See Bug:774031

== fontforge ==

DebianPts:fontforge needs a patch to propagate creation and modification times from source file. See Bug:774148 and the [[https://anonscm.debian.org/cgit/reproducible/fontforge.git/log/?h=pu/reproducible_builds|pu/reproducible_builds branch]].

== python-qt4 ==

DebianPts:python-qt4 needs a patch to remove timestamps from generated files. See Bug:774509

== pyqt5 ==

DebianPts:pyqt5 needs a patch to remove timestamps from generated files. See Bug:774510

== libxslt ==

DebianPts:libxslt needs a patch to make generate-id() return identifiers in a deterministic way. See the [[https://anonscm.debian.org/cgit/reproducible/libxslt.git/log/?h=pu/reproducible_builds|pu/reproducible_builds branch]].

== Usage example ==

If you have a `pbuilder` already setup, it's fairly easy to setup an environment with the custom toolchain:

{{{
sudo cp /var/cache/pbuilder/base.tgz /var/cache/pbuilder/base-reproducible.tgz
sudo pbuilder --login --save-after-exec --basetgz /var/cache/pbuilder/base-reproducible.tgz
echo 'deb http://reproducible.alioth.debian.org/debian/ ./' > /etc/apt/sources.list.d/reproducible.list
apt-get install busybox
busybox wget -O- http://reproducible.alioth.debian.org/reproducible.gpg | apt-key add -
apt-get purge busybox
apt-key fingerprint | grep '49B6 5747 36D0 B637 CC37 01EA 5DB7 CA67 EA59 A31F' || echo 'Something is wrong'
apt-get update
apt-get upgrade
exit 0
}}}

Once that's done, one can use the [[http://anonscm.debian.org/cgit/reproducible/misc.git/tree/prebuilder|prebuilder script]] or manually test a package through:

{{{
apt-get source --download-only acl
sudo DEB_BUILD_OPTIONS=nocheck pbuilder --build --debbuildopts '-b' --basetgz /var/cache/pbuilder/base-reproducible.tgz acl_*.dsc
mkdir b1 b2
dcmd cp /var/cache/pbuilder/result/acl_*.changes b1
sudo dcmd rm /var/cache/pbuilder/result/acl_*.changes
sudo DEB_BUILD_OPTIONS=nocheck pbuilder --build --debbuildopts '-b' --basetgz /var/cache/pbuilder/base-reproducible.tgz acl_*.dsc
dcmd cp /var/cache/pbuilder/result/acl_*.changes b2
sudo dcmd rm /var/cache/pbuilder/result/acl_*.changes
}}}

`debbindiff` (available in Debian main) is useful to check the result:

{{{
debbindiff --html $output_file b1/*.changes b2/*.changes
}}}

== Adding a package to the APT archive ==

On `alioth.debian.org`:
 1. Import the private signing key to your keyring, if you haven't already: `gpg --import /home/groups/reproducible/private/reproducible-private.gpg`
 1. Place the package files in `/home/groups/reproducible/htdocs/debian/`
 1. Run `make` from that directory.

--------

= Countering "trusting trust" issues by building Perl without Perl =

[[http://cm.bell-labs.com/who/ken/trust.html|Ken Thompson's classic "trusting trust" essay]] outlines an attack vector for embedding code in a compiler that will then be automatically added to anything that the compiler compiles. Since the main approach for making reproducible builds, using debhelper tools written in Perl, relies heavily on Perl, Debian's Perl package should be reproducibly buildable without any Perl at all. That will ensure the "trusting trust" cycle is broken. The current Perl packaging does not use debhelper, so that's a good start. Also, it looks like the Perl build system is written in `make`.


--------

= bash script to compare two package builds =

Usage: `./diffp r1/hello_2.8-4_amd64.changes r2/hello_2.8-4_amd64`

The script is available in the [[http://anonscm.debian.org/gitweb/?p=reproducible/misc.git|misc.git]] repository.

--------

= bash script to analyze images =
Deterministic images (raw images, qcow2 images, iso's) are the next logical evolution. There is a analyze_image bash script that creates sha512 hashes of all files included within an image, access rights, symlinks, parition table, bootloader and more. Doing this with two images that should match and comparing the reports the script creates can help to identify sources of non-determinism in images.

See also:

  * [[ReproducibleInstalls|Reproducible installs]]
  * [[http://lists.alioth.debian.org/pipermail/reproducible-builds/Week-of-Mon-20131209/000009.html|Announcing Whonix's First Implementation of Verifiable Builds]]
  * [[https://www.whonix.org/wiki/Verifiable_Builds|Whonix Verifiable Builds]]
  * [[https://github.com/adrelanos/Whonix/blob/master/help-steps/analyze_image|analyze_image bash script]]

Does not have iso support yet. The autor (Patrick Schleizer) is interested to generalize the script for more generic, Debian use cases.

--------

= Further work =

Having reproducible builds allows us to trust binary packages better, because it becomes easier to have:

  * diversity of buildd location and jurisdiction - build packages in more than one location, including the developer's

  * diversity of buildd hardware, in case of hardware bugs, or malicious implants - a mix of VMs, some real hardware, different CPU manufacturers, different date of manufacture and supplier

  * diversity of people - multiple signatures on a .changes file

  * diversity of kernels, explained below


== Kernel packages ==

Special features of kernel packages (including bootloaders and hypervisors) - GRUB2, Xen, linux, kfreebsd...

  * we put huge trust in them - kernels are the ultimate target of any rootkit, able to completely hide from userland

  * a kernel image built for amd64, if the build system is portable and reproducible enough, will be the same whether built from linux-amd64 or kfreebsd-amd64

  * or maybe from different kernel versions - for example, a jessie build chroot on a wheezy host system

Then we would be better protected from something that could affect many systems at once, such as a kernel vulnerability; or widespread infection by a rootkit, which now must be compatible with more than one type of kernel to go unnoticed.

--------

= References =

 * Mike Perry's discussion of how it took him eight weeks to make the Tor Browser Bundle have this feature: http://people.debian.org/~paulproteus/mike-perry-reproducible-tbb.txt
 * [[http://www.gitian.org/|Gitian: a secure software distribution method]]
 * Deterministic virtual machines:
   * "!ReVirt: Enabling Intrusion Analysis through Virtual-Machine Logging and Replay" http://web.eecs.umich.edu/virtual/papers/dunlap02.pdf
   * "Debugging operating systems with time-traveling virtual machines" http://web.eecs.umich.edu/virtual/papers/king05_1.pdf
   * "A Particular Bug Trap: Execution Replay Using Virtual Machines" http://arxiv.org/pdf/cs.DC/0310030
   * "!ReTrace: Collecting Execution Trace with Virtual Machine Deterministic Replay" http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.84.5732
   * "Execution Replay for Multiprocessor Virtual Machines" http://www.eecs.umich.edu/~pmchen/papers/dunlap08.slides.ppt
 * [[http://cm.bell-labs.com/who/ken/trust.html|Reflections on Trusting Trust]], by Ken Thompson
 * [[http://www.dwheeler.com/trusting-trust/|Fully Countering Trusting Trust through Diverse Double-Compiling (DDC)]] - a PhD dissertation on how to use reproducible builds to counter the "trusting trust" attack on compilers
 * [[https://blogs.kde.org/2013/06/19/really-source-code-software|Is that really the source code for this software?]] by Jos van den Oever on blogs.kde.org (2013-06-19). Compare reproducing tar from the Debian, Fedora and OpenSUSE packages.
 * [[https://blog.torproject.org/blog/deterministic-builds-part-one-cyberwar-and-global-compromise|Deterministic Builds Part One: Cyberwar and Global Compromise]] by Mike Perry
 * [[https://blog.torproject.org/blog/deterministic-builds-part-two-technical-details|Deterministic Builds Part Two: Technical Details]] by Mike Perry
 * [[http://lists.debian.org/debian-devel/2007/09/msg00746.html|An old discussion on debian-devel]].
 * [[http://lists.debian.org/debian-devel/2000/11/msg01758.html|An even older discussion on debian-devel]] (2000).
 * [[http://securityblog.redhat.com/2013/09/18/reproducible-builds-for-fedora/|Reproducible Builds for Fedora]]
 * [[https://lists.ubuntu.com/archives/ubuntu-devel/2013-September/037673.html|Colin Watson's answer on ubuntu-devel]] to “Will Ubuntu use "reproducible builds" as debian is planning to do?”
 * [[https://guardianproject.info/2014/06/09/our-first-deterministic-build-lil-debi-0-4-7/|Our first deterministic build: Lil’ Debi 0.4.7]] ([[https://github.com/guardianproject/lildebi/wiki/Deterministic-Builds|wiki]])
 * [[https://air.mozilla.org/why-and-how-of-reproducible-builds-distrusting-our-own-infrastructure-for-safer-software-releases/|Why and How of Reproducible Builds: Distrusting Our Own Infrastructure for Safer Software Releases]], Seth Schoen and Mike Perry at Mozilla San Francisco, 2014-11-05
 * Misc. upstream discussions:
   * Octave: [[https://savannah.gnu.org/bugs/?43087|bug report]] and [[https://www.marshut.net/kkyttn/reproducible-builds-bug-report-43087.html|mailing list thread]]
   * groff: [[https://lists.gnu.org/archive/html/groff/2014-08/msg00112.html|mailing list thread]]
   * GHC (Glasgow Haskell Compiler): [[https://ghc.haskell.org/trac/ghc/ticket/4012|#4012]]

= Presentations =

 * [[https://fosdem.org/2014/schedule/event/reproducibledebian/|Reproducible Builds for Debian]], ''Distributions devroom'', FOSDEM’14, [[http://video.fosdem.org/2014/H1302_Depage/Saturday/Reproducible_Builds_for_Debian.webm|Video]], [[http://reproducible.alioth.debian.org/presentations/2014-02-01-FOSDEM14.pdf|Slides]] ([[http://anonscm.debian.org/gitweb/?p=reproducible/presentations.git;a=tree;f=2014-02-01-FOSDEM14;hb=HEAD|Sources]])
 * [[https://summit.debconf.org/debconf14/meeting/78/reproducible-builds-for-debian/|Reproducible Builds, a year later]], [[http://debconf14.debconf.org/|DebConf14]], [[http://meetings-archive.debian.net/pub/debian-meetings/2014/debconf14/webm/Reproducible_Builds_for_Debian_a_year_later.webm|Video]], [[http://reproducible.alioth.debian.org/presentations/2014-08-26-DebConf14.pdf|Slides]] ([[http://anonscm.debian.org/gitweb/?p=reproducible/presentations.git;a=tree;f=2014-08-26-DebConf14;hb=HEAD|Sources]])
 * [[https://events.ccc.de/congress/2014/Fahrplan/events/6240.html|Reproducible Builds, Moving Beyond Single Points of Failure for Software Distribution]], [[https://events.ccc.de/congress/2014/|31th Chaos Communication Congress]], [[http://media.ccc.de/browse/congress/2014/31c3_-_6240_-_en_-_saal_g_-_201412271400_-_reproducible_builds_-_mike_perry_-_seth_schoen_-_hans_steiner.html|Video]], [[https://events.ccc.de/congress/2014/Fahrplan/system/attachments/2491/original/2014CCCReproducible.pdf|Slides]]

= Publicity =

This section lists URLs, people, and dates for when other people have publicly expressed interest, or shared information about, the project.

 * Mike Perry, 2013-08-20: [[https://blog.torproject.org/blog/deterministic-builds-part-one-cyberwar-and-global-compromise|Deterministic Builds Part One: Cyberwar and Global Compromise]]
 * Jake Edge, 2013-08-21: [[https://lwn.net/Articles/564263/|Security software verifiability]]
 * Holger Levsen, 2014-09-26: [[http://layer-acht.org/thinking/blog/20140925-reproducible-builds/|Reproducible builds? I never did any - manually]]

= Related projects =

 * [[http://reproducible.io/|CARE]] monitors the execution of the specified command to create an archive that contains all the material required to re-execute it in the same context.

It should be possible to reproduce, byte for byte, every build of every package in Debian.

Drivers:

  • Lunar
  • h01ger

Status

  • Current focus in on the toolchain: trying to get as few changes as possibles in key packages to make as many builds as possible reproducible.
  • We have a custom toolchain that will allow a good amount of packages to be reproducible, as long as they use dh for their build process.

  • We have a specification and a prototype implementation for recording the build environment.