This is a draft/discussion for a spec in development. Some code and other notes are here: https://github.com/infinity0/rb-prefix-map/tree/master/consume
This is a standard that defines an environment variable BUILD_PATH_PREFIX_MAP that distributions can set centrally, and have build tools consume this in order to produce reproducible output.
Before implementing this, you should scan through our checklist to see if you can avoid implementing it.
TODO: the actual proposal
Please read our (TODO specification) for details.
See Standard Environment Variables for more detailed discussion of the rationales behind this mechanism.
Setting the variable
Reading the variable
More detailed discussion
(See Standard Environment Variables for general arguments.)
Comparison to SOURCE_DATE_EPOCH
SOURCE_DATE_EPOCH's underlying information (date of last modification) is a property of the source code, and is therefore a constant reproducible value by definition. By contrast, BUILD_PATH_PREFIX_MAP's underlying information (maps from paths to other paths) is not itself a property of the source code.
What it is, is roughly the "difference" between build-time path information, and a property of the source code. In other words, build tools read paths from the filesystem, then "subtract" SOURCE_DATE_EPOCH from it to get reproducible paths out of it. These might be relative paths, or they could be abstract absolute paths that might not exist on the build machine, but could exist (based on a contract with other tools) on the end user's run-time machine. This latter information is output, instead of the build-time path information.
Why don't we use a reproducible value directly? In fact some buildsystems can do this, and they don't need this variable. For example, some buildsystems and programming languages force a very rigid structure to the source tree, and either both the high-level and low-level build tools are able to determine the relative paths of every source file, or else the high-level tools detect this information and pass it down into the low-level tools.
However, this is only easy to arrange for vertically-integrated build stacks where the whole stack is controlled by just a few parties. For C/C++ and other languages, there are several different buildsystems that each want to work for several different compilers at the same time, so there is a disincentive to add special logic for tool-specific command-line options.
For example, GCC does in fact support a -fdebug-prefix-map option where a high-level build tool (or human) can supply information on "what to subtract". But historically, the only use-case that was imagined for this was debug info (hence its name) and the map does not apply to other things like __FILE__ macros. It's unlikely that a higher-level buildsystem would want to spend the effort to detect appropriate values for this automatically, merely to have nicer debuginfo. More generally, command-line based solutions are hard for us the Reproducible Builds project to scale across the whole FOSS ecosystem; see "We'll add a command line flag instead" for more discussion.
This environment variable therefore, can also act as a standard interface between parent buildsystems and low-level build tools. The former can pass to the latter, information about the path structure of the overall software project. This can even be done without co-ordination: unlike command-line options, programs usually ignore environment variables that they don't recognise. So a parent buildsystem can pre-emptively choose to set this variable, even if not all of its child build tools can read it yet.
Similarly, our usage of SOURCE_DATE_EPOCH so far has been for the distribution's package builder program to set this value in an envvar, but it is conceivable that buildsystems could set this themselves, e.g. by reading it from VCS or a ChangeLog file in the source tarball, for lower-level build tools to consume.
History and alternative proposals
FIXME: stuff about --fdebug-prefix-map, DW_AT_producer, etc.