12836
Comment: sphinx timestamps now use S_D_E
|
12921
mark some things fixed
|
Deletions are marked like this. | Additions are marked like this. |
Line 114: | Line 114: |
* [[ReproducibleBuilds/TimestampsFromCPPMacros|Timestamps from C pre-processor macros]] | |
Line 116: | Line 115: |
Line 118: | Line 116: |
* [[ReproducibleBuilds/EpydocIssues|Timestamps in documentation generated by Epydoc]] | |
Line 121: | Line 118: |
* [[ReproducibleBuilds/TimestampsInDocumentationGeneratedByGroff|Timestamps in documentation generated by groff]] | |
Line 123: | Line 119: |
* [[ReproducibleBuilds/TimestmapsInDocumentationGeneratedByMan2Html|Timestamps in documentation generated by man2html]] | |
Line 139: | Line 134: |
* [[ReproducibleBuilds/TimestampsGeneratedByDocbookToMan|Timestamps generated by docbook-to-man]] | |
Line 144: | Line 138: |
* [[ReproducibleBuilds/PdfGeneratedByGhostscript|Timestamps and ids in PDF generated by Ghostscript]] | |
Line 151: | Line 144: |
Fixed in Debian, needs to be upstreamed: * [[ReproducibleBuilds/TimestampsFromCPPMacros|Timestamps from C pre-processor macros]] (partially fixed in Debian and upstream) * [[ReproducibleBuilds/TimestampsGeneratedByDocbookToMan|Timestamps generated by docbook-to-man]] * [[ReproducibleBuilds/EpydocIssues|Timestamps in documentation generated by Epydoc]] * [[ReproducibleBuilds/TimestampsInDocumentationGeneratedByGroff|Timestamps in documentation generated by groff]] * [[ReproducibleBuilds/PdfGeneratedByGhostscript|Timestamps and ids in PDF generated by Ghostscript]] * [[ReproducibleBuilds/TimestmapsInDocumentationGeneratedByMan2Html|Timestamps in documentation generated by man2html]] |
Contents
- Introduction
- Testing procedure
- First steps to make a package reproducible
-
Identified problems, and possible solutions
- Files in data.tar depends on filesystem order
- Files in data.tar vary with the locale
- Files in data.tar vary with the timezone
- Files in data.tar contains hostname, uname output, username
- Files in data.tar contain timestamps
- Files in data.tar depends on (pseudo-)randomness
- Permissions in data.tar depends on umask
- Other umask related variations
- Symlinks in data.tar contain varying file mode
- Members of control.tar and data.tar have varying mtimes
- Installed-Size differs
- Lintian tags
Introduction
A source package should build byte-for-byte identical products when rebuilt in the same environment. The environment is defined as:
The architecture the package is built on (build architecture).
The set of binary packages involved in the build. This includes Essential packages, build-essential, and Build-Depends and Build-Depends-Indep with the recursive dependencies for each packages.
- The path to the build directory.
We have seen variations related to the time of the build, the order of files on the filesystem, the current user, the system hostname, the uname output, (pseudo-)-randomness, and the CPU features or load. Such aspects must not be captured by the build process to make a package reproducible.
Testing procedure
The easiest way to get a test environment for now is to use pbuilder. Setup a build chroot with the experimental toolchain. Then clone the reproducible/misc.git repository.
Hacking packages then looks like:
apt-get source foo
cd foo*
dch -v "$(dpkg-parsechangelog -S Version).0~reproducible1"
- Fix reproducibility issues.
dpkg-buildpackage -S
cd ../../misc/prebuilder
dcmd cp ../foo*reproducible1*.dsc .
./rebuild.sh foo
Look for a fat big “reproducible” banner if test is successful. Otherwise, look for logs and debbindiff output in the logs directory.
First steps to make a package reproducible
Currently a plain sid environment is not enough to build packages reproducibly. An experimental toolchain is available while changes are waiting to be integrated in the main archive.
The first steps needed to make a package build reproducibly depends on the packaging style. With this, the basics should be covered and simple packages should build reproducible. See the next chapter for a discussion of common reproducibility issues and their solutions.
dh
No changes required.
cdbs
No changes required.
Explicit calls to dh_*
Please consider migrating the package to dh.
If the packages is not reproducible, start by adding dh_strip_nondeterminism before dh_compress.
Do It Yourself
Please consider migrating the package to dh.
Usual fixes are:
Make sure that the mtime of the files in binary packages are deterministic. Example patch
Prevent gzip from recording the current time. Example patch
Make sure md5sums is written in a stable order. Example patch
Identified problems, and possible solutions
Once applied the basic recommendations, a package might still not build reproducibly.
debbindiff has been written to help one understanding the sources of unreproducibility. It compares two files and output their differences, recursively unpacking archives or using transformation tool to generates meaningful diffs. In the Debian context, it's usually called on the .changes files of two different builds.
What follows is a list of common issues and known solutions. For reference, we keep documentation on old issues that should not happen with the current experimental toolchain.
Files in data.tar depends on filesystem order
The build system needs to be patched to sort directory listings.
Specific issues:
Files in data.tar vary with the locale
The build system should be patched to add LC_ALL=C.UTF-8 to the environment.
Files in data.tar vary with the timezone
The timezone used while building can also have an effect on reproducibility.
Specific issues:
Files in data.tar contains hostname, uname output, username
Remove places where the information is recorded. If that's not possible, use dummy values. Example
Specific issues:
Files in data.tar contain timestamps
Recommended solutions in order of preference:
- Prevent the timestamp from being written entirely in the build products.
- Tell the tools to use the timestamp of “0” if the timestamp is not used.
Tell the tools to use the timestamp of the last debian/changelog entry.
- Strip timestamps at the end of the build process.
- Replace timestamps at the end of the build process.
Specific issues:
?Timestamps in documentation generated by HTMLDOC
Fixed in Debian, needs to be upstreamed:
Timestamps from C pre-processor macros (partially fixed in Debian and upstream)
?Timestamps in documentation generated by man2html
Files in data.tar depends on (pseudo-)randomness
Permissions in data.tar depends on umask
The permissions of files shipped by Debian packages should not change with the value of the file creation mask (umask).
Easy solution: use dh or dh_fixperms. Otherwise, care to fix the permissions manually.
Other umask related variations
Symlinks in data.tar contain varying file mode
POSIX says that the file mode on symlinks is undefined, so this ends up being system dependent behavior. On Linux the umask is ignored when creating symlinks, but on other systems such as kFreeBSD or Hurd the umask is honored, which can produce varying file modes. This is at least a problem for architecture independent packages which are built on different operating systems.
A way to always get the same file mode for symlinks is to set umask to 0 before creating them, and restore the previous umask afterwards, but this might be unfeasible in general.
Members of control.tar and data.tar have varying mtimes
dpkg-deb will record the mtime of files it packs in control.tar and data.tar. Files generated during the build process will get a different mtime with each build. Best solution is to change their mtime to the date of the latest debian/changelog entry.
debhelper will take care of adjusting mtimes (759886).
Example patch for custom packaging style. Known affected packages.
Installed-Size differs
Installed-Size can be different depending See 650077.
Guillem Jover wrote a patch for dpkg solving the issue.
Lintian tags
Some Lintian tags can help identifying issues that could prevent a package from being reproducible: