Differences between revisions 44 and 45
Revision 44 as of 2015-09-13 09:33:11
Size: 10940
Editor: Lunar
Comment: be more accurate on debhelper exporting SOURCE_DATE_EPOCH and improve formatting
Revision 45 as of 2015-09-17 01:19:57
Size: 11045
Editor: ?BenBoeckel
Comment: CMake: update example to use lower-case command names, use $ENV{SOURCE_DATE_EPOCH}, fix argument quoting, strip trailing whitespace, and formatting cleanup so it is easier to read
Deletions are marked like this. Additions are marked like this.
Line 135: Line 135:
  EXECUTE_PROCESS(COMMAND "date" "-u" "-d @${SOURCE_DATE_EPOCH}" "+%d/%m/%Y" OUTPUT_VARIABLE BUILD_DATE)
else()
  EXECUTE_PROCESS(
COMMAND "date" "+%d/%m/%Y" OUTPUT_VARIABLE BUILD_DATE)
endif()
  execute_process(
    
COMMAND "date" "-u" "-d" "@$ENV{SOURCE_DATE_EPOCH}" "+%d/%m/%Y"
   
OUTPUT_VARIABLE BUILD_DATE
    OUTPUT_STRIP_TRAILING_WHITESPACE
)
else ()
  execute_process(
    
COMMAND "date" "+%d/%m/%Y"
   
OUTPUT_VARIABLE BUILD_DATE
    OUTPUT_STRIP_TRAILING_WHITESPAC
E)
endif ()

Specification document

Please feel free to jump straight to our published SOURCE_DATE_EPOCH specification at https://reproducible-builds.org/specs/source-date-epoch/

Motivation

Many documentation generators like to put the current date in the generated documentation. This is obviously not reproducible. Furthermore, it is not feasible to develop a diff algorithm to ignore "build dates" in arbitrary data formats, and fundamentally impossible in the case of Turing-complete data formats such as executables, since the real behaviour of the result could easily change based on a piece of data embedded in the file, even if the data is itself static or immutable.

Typically, there are several misguided rationales for embedding the build date. These are:

  • it gives "some indication" of the age of the software. However, this becomes basically redundant with reproducible builds, as the whole point of reproducible builds is that the build result will be exactly the same no matter when it was built. To phrase this differently: if the only difference in the build result is the embedded build date, then this difference is meaningless and should be removed, or replaced with a meaningful date.

  • it gives "some indication" of the build environment (e.g. age of the build dependencies?). But with reproducible builds, there is no need to guess which build environment has been used, based on a timestamp. To allow users to reproduce binaries, the build environment is either known in advance, or recorded (e.g. in .buildinfo files).

So, we will try to eliminate this variation: there is nothing to be lost, and everything to be gained.

Goal

Generally, a better solution is to embed the date of the last modification to the source code. This proposal attempts to define some standards for tools to operate, based on this principle.

Proposal

Please read our SOURCE_DATE_EPOCH specification for details.

Rationale

See further below for a more detailed rationale on these.

Upstream build processes are encouraged to read this variable for any embedded timestamps, but generally not need attempt to auto-detect the correct value. (Indeed, this may result in conflicts with the distribution, that would need to be resolved on a per-package basis, outside of the scope of this document.)

In Debian, this is automatically set to the same time as the latest entry in debian/changelog, i.e. the same as the output of dpkg-parsechangelog -SDate.

Setting the variable

For packages using dh, debhelper (in our toolchain, see 791823) exports this variable during builds, so no changes are needed.

CDBS (in our toolchain, see 794241) exports this variable during builds, so no changes are needed.

As a last resort to be avoided where possible, package maintainers may set and export this variable manually in debian/rules:

export SOURCE_DATE_EPOCH = $(shell date -d "$$(dpkg-parsechangelog -SDate)" +%s)

Reading the variable

We are persuading upstream tools to support this directly. You may help by writing patches for these tools; please add links to the bug reports here so we know, and to act as an example resource for future patch writers.

Pending

doxygen, gcc, gettext, libxslt, ocamldoc, qt4-x11, texlive-bin, txt2man

Complete

As a last resort to be avoided where possible (e.g. if the upstream tool is too hard to patch, or too time-consuming for you right now to patch, or if they are being uncooperative or unresponsive), package maintainers may try something like the following:

debian/strip-nondeterminism/a2x:

# Depends: faketime
# Eventually the upstream tool should support SOURCE_DATE_EPOCH internally.
test -n "$SOURCE_DATE_EPOCH" || { echo >&2 "$0: SOURCE_DATE_EPOCH not set"; exit 255; }
exec faketime -f "$(TZ=UTC date -d "@$SOURCE_DATE_EPOCH" +'%Y-%m-%d %H:%M:%S')" /usr/bin/a2x "$@"

debian/rules:

export PATH := $(CURDIR)/debian/strip-nondeterminism:$(PATH)

debian/control:

Build-Depends: faketime

But please be aware that this does not work out of the box with pbuilder on Debian 7 Wheezy, see #778462 against faketime and #700591 against pbuilder (fixed in Jessie, but not Wheezy). Adding an according hook to /etc/pbuilder/hook.d which mounts /run/shm inside the chroot should suffice as local workaround, though.

TODO: document some other nicer options. Generally, all invocations of date(1) need to have a fixed TZ environment set.

Examples

Python

import os
import datetime
try:
    d = datetime.datetime.utcfromtimestamp(int(os.environ['SOURCE_DATE_EPOCH']))
except (KeyError, ValueError):
    d = datetime.datetime.utcnow()

print(d)

Bash / POSIX shell

DATE_FMT="%Y-%m-%d"
if [ -n "$SOURCE_DATE_EPOCH" ]; then
  BUILD_DATE=$(date -u -d "@$SOURCE_DATE_EPOCH" "+$DATE_FMT" 2>/dev/null || date -u -r "$SOURCE_DATE_EPOCH" "+$DATE_FMT" 2>/dev/null || date -u "+$DATE_FMT")
else
  BUILD_DATE=$(date "+$DATE_FMT")
fi

echo $BUILD_DATE

The above will work with either GNU or BSD date, and fallback to ignore SOURCE_DATE_EPOCH if both fails.

Makefile

DATE_FMT = %Y-%m-%d
ifdef SOURCE_DATE_EPOCH
    BUILD_DATE ?= $(shell date -u -d "@$(SOURCE_DATE_EPOCH)" "+$(DATE_FMT)"  2>/dev/null || date -u -r "$(SOURCE_DATE_EPOCH)" "+$(DATE_FMT)" 2>/dev/null || date -u "+$(DATE_FMT)")
else
    BUILD_DATE ?= $(shell date "+$(DATE_FMT)")
endif

echo $(BUILD_DATE)

The above will work with either GNU or BSD date, and fallback to ignore SOURCE_DATE_EPOCH if both fails.

CMake

if (DEFINED ENV{SOURCE_DATE_EPOCH})
  execute_process(
    COMMAND "date" "-u" "-d" "@$ENV{SOURCE_DATE_EPOCH}" "+%d/%m/%Y"
    OUTPUT_VARIABLE BUILD_DATE
    OUTPUT_STRIP_TRAILING_WHITESPACE)
else ()
  execute_process(
    COMMAND "date" "+%d/%m/%Y"
    OUTPUT_VARIABLE BUILD_DATE
    OUTPUT_STRIP_TRAILING_WHITESPACE)
endif ()

The above will work only with GNU date. See POSIX shell example on how to support BSD date.

C

#include <errno.h>
#include <limits.h>

struct tm *build_time;
time_t now;
char *source_date_epoch;
unsigned long long epoch;
char *endptr;

source_date_epoch = getenv("SOURCE_DATE_EPOCH");
if (source_date_epoch) {
        errno = 0;
        epoch = strtoull(source_date_epoch, &endptr, 10);
        if ((errno == ERANGE && (epoch == ULLONG_MAX || epoch == 0))
                        || (errno != 0 && epoch == 0)) {
                fprintf(stderr, "Environment variable $SOURCE_DATE_EPOCH: strtoull: %s\n", strerror(errno));
                exit(EXIT_FAILURE);
        }
        if (endptr == source_date_epoch) {
                fprintf(stderr, "Environment variable $SOURCE_DATE_EPOCH: No digits were found: %s\n", endptr);
                exit(EXIT_FAILURE);
        }
        if (*endptr != '\0') {
                fprintf(stderr, "Environment variable $SOURCE_DATE_EPOCH: Trailing garbage: %s\n", endptr);
                exit(EXIT_FAILURE);
        }
        if (epoch > ULONG_MAX) {
                fprintf(stderr, "Environment variable $SOURCE_DATE_EPOCH: value must be smaller than or equal to: %lu but was found to be: %llu \n", ULONG_MAX  ,epoch);
                exit(EXIT_FAILURE);
        }
        now = epoch;
} else {
        now = time(NULL);
}
build_time = gmtime(&now);

Rationale and alternate proposals

Some more clarifying points:

  • SOURCE_DATE_EPOCH is a unix timestamp which is defined as the number of seconds (excluding leap seconds) since 01 Jan 1970 00:00 UTC. This is independent of timezones, i.e. there is no way to specify this "in another timezone".

See 1 for the initial motivation behind this, including an evaluation of how different programming languages handle date formats.

Currently, we do not have a proposal that includes anything resembling a "time zone". Developing such a standard requires consideration of various issues:

Intuitive and naive ways of handling human-readable dates, such as the POSIX date functions, are highly flawed and freely mix implicit not-well-defined calendars with absolute time. For example, they don't specify they mean the Gregorian calendar, and/or don't specify what to do with dates before when the Gregorian calendar was introduced, or use named time zones that require an up-to-date timezone database (e.g. with historical DST definitions) to parse properly.

Since this is meant to be a universal standard that all tools and distributions can support, we need to keep things simple and precise, so that different groups of people cannot accidentally interpret it in different ways. So it is probably unwise to try to standardise anything that resembles a named time zone, since that is very very complex.

One likely candidate would be something similar to the git internal timestamp format, see man git-commit:

  • It is <unix timestamp> <time zone offset>, where <unix timestamp> is the number of seconds since the UNIX epoch. <time zone offset> is a positive or negative offset from UTC. For example CET (which is 2 hours ahead UTC) is +0200.

We already have SOURCE_DATE_EPOCH so the time zone offset could be placed in SOURCE_DATE_TZOFFSET or something like that. But all of this needs further discussion.

Other non-standard variables that we haven't yet agreed upon, use at your own risk:

export SOURCE_DATE_TZOFFSET = $(shell dpkg-parsechangelog -SDate | tail -c6)
export SOURCE_DATE_RFC2822 = $(shell dpkg-parsechangelog -SDate)
export SOURCE_DATE_ISO8601 = $(shell python -c 'import time,email.utils,sys;t=email.utils.parsedate_tz(sys.argv[1]);\
print(time.strftime("%Y-%m-%dT%H:%M:%S",t[:-1])+"{0:+03d}{1:02d}".format(t[-1]/3600,t[-1]/60%60));' "$(SOURCE_DATE_RFC2822)")

The ISO8601 code snippet is complex, in order to preserve the same timezone offset as in the RFC2822 form. If one is OK with stripping out this offset, i.e. forcing to UTC, then one can just use date -u instead. However, this then contains the same information as the unix timestamp, but the latter is generally easier to work with in nearly all programming languages.

Specification

A specification is published separately at: https://reproducible-builds.org/specs/source-date-epoch/