6114
Comment: formating
|
6335
reword some problems
|
Deletions are marked like this. | Additions are marked like this. |
Line 108: | Line 108: |
=== Build paths === | === Files in data.tar.gz contains build paths === |
Line 115: | Line 115: |
=== Locale === | === Files in data.tar.gz depends on readdir order === The build system needs to be patched to sort directory listings. === Files in data.tar.gz varies with the locale === |
Line 126: | Line 130: |
=== hostname, uname output, username === | === Files in data.tar.gz contains hostname, uname output, username === |
Line 134: | Line 138: |
=== Data files in data.tar.gz have timestamps === | === Files in data.tar.gz contains timestamps === |
It should be possible to reproduce, byte for byte, every build of every package in Debian.
For now, we will start with a few maintainers who want to opt in to this goal as we flesh out the details of what will make it possible. This page tracks our progress.
Contents
- Drivers
- Why do we want reproducible builds?
- Status
- Use cases
- Detailed package status list
- Reproducing builds
- Known bugs we are waiting on
-
Different problems, and their solutions
- Non-problems
- Files in data.tar.gz contains build paths
- Files in data.tar.gz depends on readdir order
- Files in data.tar.gz varies with the locale
- Files in data.tar.gz contains hostname, uname output, username
- Files in data.tar.gz contains timestamps
- {data,control}.tar.{gz,xz,bz2} may have timestamps
- {data,control}.tar.{gz,xz,bz2} will store files in readdir order
- References
Drivers
Why do we want reproducible builds?
- Independent verifications that a binary matches what the source intended to produce.
Help Multi-Arch: same packages co-installation (as they need every matching file to be byte identical).
- Be able to generate debug symbols for packages which do not have a “debug package”.
Others?
Status
Proof of concept success: hello package: Contents of data.tar.gz and control.tar.gz can be made reproducible when 'gzip' replaced by 'gzip -n' in debian/rules. (#xyz)
Buy-in within Debian: 5 packages from 5 maintainers are interested, of which 0 so far have reproducible contents of {data,control}.tar.gz
- Waiting on a few dpkg bugs for avoiding timestamps and file order inconsistency in {data,control}.tar.gz (or .xz)
- You can use a script to rebuild a package, with the same build-depends that were used by the build daemons. See "How to reproduce a build" below.
- Things that need further investigation (by e.g. you!)
- Document how to use Lunar's script to reproduce a build.
- Find out if {control,data}.tar.gz files created by dpkg 1.17.1+ have a timestamp embedded.
Use cases
- If the Debian build daemons are compromised, end users can assure themselves that their binaries are OK if they can regenerate them (and their build dependencies). (You could use a more complicated equivalence test than "do the hashes match?" but if the hashes do match, this is simple.)
Detailed package status list
- alpine (Asheesh Laroia)
- Status: Untested
- haveged (Lunar)
- Status: Unknown
- iotop (pabs)
- Status: Unknown
- debhelper (joeyh)
- Status: Unknown
- magit (lindi)
- Status: Unknown
Reproducing builds
There are two sides to the problem: first we need to record the initial build environment, and then we need a way to set up the same environment.
Recording the environment
The right place to record the build environment is the .changes file. Rationale: it lists the checksums of the build products and is signed by either the maintainer or the buildd operator.
To add a field to the .changes file, it is possible to add the following in debian/control:
XC-Build-Environment: ${misc:Build-Environment}
The substvars can be filled by using something like:
COLUMNS=999 | dpkg -l | awk ' BEGIN { printf "misc:Build-Environment=" } /^ii/ { ORS=", "; print $2 " (= " $3 ")" }' | sed -e 's/, $//' >> debian/substvars
Ideally, this should go in debhelper. But in can be added manually to debian/rules in the meantime.
This does not work currently as dpkg-genchanges does not substitute the variable before adding the field in .changes! It can be fixed by a trivial patch against dpkg, see 719854.
Reproduce the build environment
Someone needs to document Lunar's script here: http://people.debian.org/~paulproteus/lunar-verify-script.rb
Known bugs we are waiting on
Different problems, and their solutions
Build systems tend to capture information about the environment that makes them produce different results accross different systems, despite having the same architecture and software installed.
Ideally, such variations should be fixed in the build system itself, but it might sometimes not be possible.
Non-problems
- You might think ELF binaries (e.g. /usr/bin/hello in the hello package) have embedded timestamps. Luckily, they don't!
Files in data.tar.gz contains build paths
These should really be patched out in one way or another. This is not useful information and can actually hide real bugs.
For debug files, use debugedit.
Files in data.tar.gz depends on readdir order
The build system needs to be patched to sort directory listings.
Files in data.tar.gz varies with the locale
Builds should be made with LC_ALL=C.UTF-8.
It's quite unpractical to force such value in debian/rules and there is actually no reason this should not be the default.
Actions:
- We could make dpkg-buildpackage exports this variable; but we would need to change the policy to make dpkg-buildpackage be the canonical solution to build package.
Files in data.tar.gz contains hostname, uname output, username
Actions:
We could write a LD_PRELOAD library that could answers consistent results
for several system calls on the same model as libfaketime. Bdale suggested we call it liblietome.
Files in data.tar.gz contains timestamps
- Recommended solution:
- Use the timestamp of the of the last debian/changelog entry as reference.
- touch all files to the reference timestamp before building the binary packages.
- gzip -n when gzipping anything
- get rid of non-determinisim (yup...)
- Alternate solutions:
- (or) libfaketime (probably breaks some things) (sudo apt-get install faketime)
{data,control}.tar.{gz,xz,bz2} may have timestamps
- dpkg 1.17.1 might or might not store a timestamp for the .gz versions of these files.
- *.xz and *.bz2 seem to provide no ability to store a timestamp.
{data,control}.tar.{gz,xz,bz2} will store files in readdir order
This is dependent on an accident of filesystem layout at build time, so it would sometimes not be reproducible.
We should probably fix this in dpkg by sorting the contents of the tar files.
References
Mike Perry's discussion of how it took him eight weeks to make the Tor Browser Bundle have this feature: http://people.debian.org/~paulproteus/mike-perry-reproducible-tbb.txt