Differences between revisions 23 and 24
Revision 23 as of 2015-01-19 17:07:00
Size: 9587
Editor: Lunar
Comment: write some more
Revision 24 as of 2015-01-21 08:55:02
Size: 10888
Editor: Lunar
Comment: write some more
Deletions are marked like this. Additions are marked like this.
Line 38: Line 38:

[[https://summit.debconf.org/debconf14/meeting/78/reproducible-builds-for-debian/|What happened for a year]] was presented at DebConf14. The reception was unexpectedly good and the follow-up BoF truly productive. For one thing, a suitable way to record the build environment was sketched out.

One issue about using `.changes` files is that they are not kept in the archive. So to be used as a way to record the environment, they would need to be distributed with the archive. But this would be a misunderstanding of their purpose. As their name implies, `.changes` control files represent ''changes'' to archive. They were inherently designed to be transient.

So instead, we had the idea of a new `.buildinfo` control file which would be added to the archive alongside binary packges — and be uploaded by referencing them in `.changes. We quickly drafted a [[https://wiki.debian.org/ReproducibleBuilds/BuildinfoSpecification|specification]], and a couple of days later Niko Tyni came up with [[https://anonscm.debian.org/cgit/reproducible/debhelper.git/tree/dh_genbuildinfo?h=pu/reproducible_builds_2014&id=1543ea2535160bf9578149c681eb7ff324901471|an addition to debhelper]] which created a `.buildinfo` using the output of the aforementioned `dh-buildinfo`.

== strip-nondeterminism ==

Please keep in mind that history is written by the winners. Let's just hope for not too much betrayal.

Tell the tale

An history of reproducible builds in Debian, mostly written by Lunar.

An old idea

The idea of reproducible builds is not very new. In Debian world, it was mentioned first in 2000, and then more explicitly in 2007 on debian-devel: “I think it would be really cool if the Debian policy required that packages could be rebuild bit-identical from source.” The reactions were unfortunately not really enthusiastic both times.

Private property + Snowden effect

The interest on reproducible builds picked up again with Bitcoin. Users of bitcoins needed a way to trust that they were not downloading corrupted software. Initial versions of Gitian were written in 201 to solve the problem. It drives builds using virtual machines and Git.

The global surveillance disclosures in 2013 raised the interest even further. Mike Perry worked on making the Tor Browser build reproducibly in fear of a “malware that attacks the software development and build processes themselves to distribute copies of itself to tens or even hundreds of millions of machines in a single, officially signed, instantaneous update”.

Kick-off

The success of making such a large piece of software build reproducibly proved that it was feasible for other projects. This prompted Lunar to organize a discussion at DebConf13 happening July 2013. Even scheduled at the last minute, there was still about thirty attendees who were very much interested, amongst them members of the technical committee and a few other core teams. Minutes are available.

After some more research during the conference, a wiki page was created. The initial approach was to get Debian to “buy-in” on the idea by making five packages from different maintainers build reproducibly. However, it quickly appeared that before fixing issues in the toolchain, it would not be possible to even get a single package to be reproducible.

First mass-rebuilds

Lunar came up with the first patches for dpkg at the August 2013. This enabled hello from building reproducibly. The first large scale rebuild was performed soon after by David Suárez, with variations on time and build path. 24% of 5240 source packages were identified as reproducible. The first version of a “smart” comparison script was written to help reviewing differences.

A second mass rebuild was made before the presentation in the distro devroom at FOSDEM’14. It used a slightly different approach regarding build paths and had binutils built in deterministic mode. 67% of 6887 source packages were found reproducible. A result applauded by the FOSDEM crowd.

The presentation sparked interest and woke up the mailing-list created some months ago. Tomasz Buchert wrote a lintian check for gzip files. Stéphane Glondu worked on sorting logs and experimenting with alternatives for build path issues.

.buildinfo control files

In parallel, several approaches on where and how to record the build environment were considered. The first idea was to use the .changes control file through a substitution variable (719854). Instead, Guillem Jover suggested to add new fields by passing --changes-option="-DBuild-Env=… to dpkg-buildpackage. As for the value, we discovered dh-buildinfo written by Yann Dirson, described as a “debhelper addon to track package versions used to build a package”. Fit for reproducible builds!

What happened for a year was presented at DebConf14. The reception was unexpectedly good and the follow-up BoF truly productive. For one thing, a suitable way to record the build environment was sketched out.

One issue about using .changes files is that they are not kept in the archive. So to be used as a way to record the environment, they would need to be distributed with the archive. But this would be a misunderstanding of their purpose. As their name implies, .changes control files represent changes to archive. They were inherently designed to be transient.

So instead, we had the idea of a new .buildinfo control file which would be added to the archive alongside binary packges — and be uploaded by referencing them in .changes. We quickly drafted a [[https://wiki.debian.org/ReproducibleBuilds/BuildinfoSpecification|specification]], and a couple of days later Niko Tyni came up with [[https://anonscm.debian.org/cgit/reproducible/debhelper.git/tree/dh_genbuildinfo?h=pu/reproducible_builds_2014&id=1543ea2535160bf9578149c681eb7ff324901471|an addition to debhelper]] which created a .buildinfo using the output of the aforementioned dh-buildinfo`.

strip-nondeterminism

To be continued…

To add:

Giving up on build paths

Initially we though that variations happening when building the package from different build path should be eliminated. This has proven difficult. The main problem that has been identified is that full path to source files are written in debug symbols of ELF files.

First attempt used the -fdebug-prefix-map option which allows to map the current directory to a canonical one in what gets recorded. But compiler options get written to debug file as well. So it has to be doubled with -gno-record-gcc-switches to be used for reproducibility. The first large scale rebuild has proven that it was also hard to determine what the actual build path has been accurately.

Second attempt used debugedit which is used by Fedora and other to change the source paths to a canonical location after the build. Unfortunately, gcc write debug strings in a hashtable. debugedit will not reorder the table after patching the strings, so the result is still unreproducible. Adding this feature to debugedit looked difficult. We can still make the approach work by passing -fno-merge-debug-strings but this is space expensive. The second large scale rebuild used the latter approach. It was still difficult to guess the initial build path properly. Stéphane Glondu was the first to suggest to using a canonical build path to solve the issue.

During discussions at DebConf14, we revisited the idea, and felt it was indeed appropriate to decide on a canonical build path. It has an added benefit of making it easier to use debug packages: one simply has to unpack the source in the right place, no extra configuration required.

Finally, it was agreed to add a Build-Path field to .buildinfo as it made it easier to reproduce the initial build if the canonical build location would change.

Archive wide rebuilds

  • 2013-09-07 by David Suárez. 24% of 5240 source packages reproducible. Variations: time, build path.

  • 2014-01-26 by David Suárez. 67% of 6887 source packages reproducible. Variations: time, build path.

  • 2014-09-19 by Lunar, 30% of 172 source core packages reproducible. Variations: time, file order.

  • Updated daily since 2014-09-28 by jenkins.debian.net. On 2014-11-11, 13213 (61.4%) out of 21448 packages are reproducible.

Presentations

Include: Nothing found for "^= Presentations ="!

Got a spare moment? Please migrate this to our new webpages

With free software, anyone can inspect the source code for malicious flaws. But Debian provide binary packages to its users. The idea of “deterministic” or “reproducible” builds is to empower anyone to verify that no flaws have been introduced during the build process by reproducing byte-for-byte identical binary packages from a given source.

More information about reproducible builds in general are available at reproducible-builds.org.

Contents

Publicity

This section lists URLs, people, and dates for when other people have publicly expressed interest, or shared information about, the project.

Contributors

  • akira (Maria Valentina Marin)
  • Andrew Awyer
  • Asheesh Laroia
  • Chris West
  • Daniel Kahn Gillmor
  • David Suarez
  • Drew Fisher
  • Guillem Jover
  • Hans-Christoph Steiner
  • Helmut Grohne
  • Holger Levsen
  • josch (Johannes Schauer)
  • Lunar (Jérémy Bobbio)
  • Mattia Rizzolo
  • Niels Thykier
  • Niko Tyni
  • Paul Wise
  • Peter De Wachter
  • Reiner Herrmann
  • Stefano Rivera
  • Stéphane Glondu
  • Steven Chamberlain
  • Tom Fitzhenry
  • Tomasz Buchert
  • Wookey
  • Ximin Luo