Differences between revisions 1 and 11 (spanning 10 versions)
Revision 1 as of 2016-10-07 14:03:44
Size: 60
Editor: Infinity0
Comment: add link to bug
Revision 11 as of 2017-04-22 04:07:48
Size: 7197
Editor: PaulWise
Comment: page got renamed
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
Discussion with ''ftpmasters'' is happening in Bug:763822. Currently, this page a high-level overview of our design of components that use [[ReproducibleBuilds/BuildinfoFiles|buildinfo files]]. Working together, these components provide various security guarantees on publicly-distributed binaries, relating to reproducibility.

Most of the details are yet to be worked out, and this page will be updated once that happens. In particular, discussion with ''ftpmasters'' is happening in Bug:763822.

<<TableOfContents()>>

== Overview ==

The characters in our story:

Servers:

 * One <<FootNote("One" means in the sense of who controls the keys; there might be a CDN or mirror network that duplicates the contents.)>> '''archive'''. This '''accepts''' signed-buildinfo files from ''developers'', and '''publishes''' a small collection of re-signed-buildinfo files to ''rebuilders'' and ''non-building clients''.

 * Several '''sig-repos''' ("signed-buildinfo repository"). These '''accept''' signed-buildinfo files from ''developers'' and ''rebuilders'', and '''publish''' all of those that they accept to ''clients''. They are very similar to PGP keyservers, but instead of holding signatures-on-keys (and keys), they hold signed-buildinfo files. Eventually these could submit their contents to, or be, one or more transparency logs.

Clients:

 * Many '''developers''', who generate signed-buildinfo files and '''push''' them to an ''archive'' and the ''sig-repos''.

 * Many '''rebuilders'''. These '''pull''' data from the ''archive'', attempt to reproduce the builds, then generate signed-buildinfo files and '''push''' them to the ''sig-repos'' (c.f. the workflow of developers). Some might be continuous integration services, some might be manually-run.

 * Many '''non-building clients'''. Theses '''pull''' data from the ''archive'' and the ''sig-repos'', to gain confidence that what they install is reproducible, ''without rebuilding'' it themselves.

Guide to verbs:

 * Push-servers '''accept''' things from clients that '''push''' to them.
 * Pull-servers '''publish''' things to clients that '''pull''' from them.

{{attachment:actors.png}}

source: [[attachment:actors.dot]]


== Details ==

=== Archive ===

Initially, we plan to collect all buildinfo files for a given architecture into one build `Buildinfos-$arch.xz` file. This is an easy approach that the archive mirror network can cope with. Later on, we might think about dividing this up so that one can get the data in a more fine-grained manner, but the initial proposal seems able to satisfy our other goals without too much overhead.

The main issue of concern here, is that any rebuilder who wants to rebuild one binary package will need to download the Buildinfos file corresponding to their architecture (or "all" if they are building an arch:all package). We have made some measurements and this is about 9MB for each architecture (about 250MB uncompressed, about 20MB with gzip). So it is not too much of a burden - it should take less than a minute to download this on an average modern internet connection.

If we collect all architectures into one file, this download would be much greater, and would likely greatly discourage typical users from performing rebuilds. In the other direction, collecting these into a per-source-package `Buildinfos-$src-$ver.xz` might put extra resource strain on the mirror network due to the large number of files that rsync must `stat`. There is a good chance that the mirror network ''will'' be able to cope with this, but it requires more discussion between different teams so for now we've chosen to leave this for future work: 9MB seems small enough.

TODO: import dkg's suggested "validation" steps

The archive should make a strongly-attributable statement that (a) "this is all of the buildinfo files for this release", by including the names and hashes of the `Buildinfos-$arch.xz` in the `Release` file, and (b) "these are the buildinfo files for this package", by including the names and hashes of the signed-buildinfo files in the `Packages` indices.

(Hashing the signed file allows for lookup in a sig-repo later, and is a strongly-attributable statement that the archive processed the buildinfo file as signed by ''that particular developer/buildd'' and not by someone else. This does ''not'' imply the buildinfo file is correct or true; clients (non-building and rebuilders) are still free to request as many buildinfo files that match a given binary hash (of the build output) as they want from a sig-repo, to verify the claim or to find independent parties that agree with it.)

For more background information on the Debian archive, see [[DebianRepository/Format]].

=== Signed-buildinfo repository ===

TBD

Adapt a keyserver?

=== Developer ===

TBD

`dput` should be patched to upload a Buildinfo file to a sigrepo.

=== Rebuilders ===

TBD

This functionality could be added to `reprotest`, e.g. as a background service.

=== Non-building clients ===

TBD

One could configure `apt` to contact sigrepos and ask for a minimum number of
independent signatures before installing packages.


== Security considerations ==

Due to resource constraints - i.e. a design optimised for binary distribution but not signature distribution - archives are not expected to store buildinfo signatures by the original developer. Instead, they can re-sign a whole batch of buildinfo files at once, after doing basic sanity checks on them - e.g. to check that the developer isn't lying about them - and publish this. One may raise the point that this batch is redundant given the sig-repos, but actually they can help to avoid some attacks:

 * Fake buildinfo files presented as "from the archive". Yes, these would not be signed by the archive - but if the archive does not officially publish a signed version, there is no way to ''distinguish'' a legitimate one vs a fake one.

 * Sig-repos getting DoSd by buildinfo files for junk. They may instead filter only for buildinfo files that build a source package that was actually published by the archive. Of course, they MUST accept (subject to self-protection vs DoS) buildinfo files with binary hashes that contradict what the archive said - that is the whole point of reproducible-builds.

To prevent the archive framing them for generating a false or bad buildinfo file, developers MUST publish their own signed-buildinfos to a (or several) sig-repos. Developers must do this directly from their own machines, rather than relying on the archive to forward this - since the archive could just drop it, if they are being malicious. Again, yes nobody can forge the developer's signature on a buildinfo file, but if there is no signed version in public distribution, then there is no way to ''distinguish'' a legitimate one vs a fake one.

We do not yet attempt to define what sort of logic non-building clients should perform, in order to classify a "safe" vs an "unsafe" binary. This (a) does not affect the rest of our system, and (b) is a hard problem to solve, and would require more real-world data and research. The strictness of the policy will depend on the user's security needs.

Currently, this page a high-level overview of our design of components that use buildinfo files. Working together, these components provide various security guarantees on publicly-distributed binaries, relating to reproducibility.

Most of the details are yet to be worked out, and this page will be updated once that happens. In particular, discussion with ftpmasters is happening in 763822.

Overview

The characters in our story:

Servers:

  • One 1 archive. This accepts signed-buildinfo files from developers, and publishes a small collection of re-signed-buildinfo files to rebuilders and non-building clients.

  • Several sig-repos ("signed-buildinfo repository"). These accept signed-buildinfo files from developers and rebuilders, and publish all of those that they accept to clients. They are very similar to PGP keyservers, but instead of holding signatures-on-keys (and keys), they hold signed-buildinfo files. Eventually these could submit their contents to, or be, one or more transparency logs.

Clients:

  • Many developers, who generate signed-buildinfo files and push them to an archive and the sig-repos.

  • Many rebuilders. These pull data from the archive, attempt to reproduce the builds, then generate signed-buildinfo files and push them to the sig-repos (c.f. the workflow of developers). Some might be continuous integration services, some might be manually-run.

  • Many non-building clients. Theses pull data from the archive and the sig-repos, to gain confidence that what they install is reproducible, without rebuilding it themselves.

Guide to verbs:

  • Push-servers accept things from clients that push to them.

  • Pull-servers publish things to clients that pull from them.

actors.png

source: actors.dot

Details

Archive

Initially, we plan to collect all buildinfo files for a given architecture into one build Buildinfos-$arch.xz file. This is an easy approach that the archive mirror network can cope with. Later on, we might think about dividing this up so that one can get the data in a more fine-grained manner, but the initial proposal seems able to satisfy our other goals without too much overhead.

The main issue of concern here, is that any rebuilder who wants to rebuild one binary package will need to download the Buildinfos file corresponding to their architecture (or "all" if they are building an arch:all package). We have made some measurements and this is about 9MB for each architecture (about 250MB uncompressed, about 20MB with gzip). So it is not too much of a burden - it should take less than a minute to download this on an average modern internet connection.

If we collect all architectures into one file, this download would be much greater, and would likely greatly discourage typical users from performing rebuilds. In the other direction, collecting these into a per-source-package Buildinfos-$src-$ver.xz might put extra resource strain on the mirror network due to the large number of files that rsync must stat. There is a good chance that the mirror network will be able to cope with this, but it requires more discussion between different teams so for now we've chosen to leave this for future work: 9MB seems small enough.

TODO: import dkg's suggested "validation" steps

The archive should make a strongly-attributable statement that (a) "this is all of the buildinfo files for this release", by including the names and hashes of the Buildinfos-$arch.xz in the Release file, and (b) "these are the buildinfo files for this package", by including the names and hashes of the signed-buildinfo files in the Packages indices.

(Hashing the signed file allows for lookup in a sig-repo later, and is a strongly-attributable statement that the archive processed the buildinfo file as signed by that particular developer/buildd and not by someone else. This does not imply the buildinfo file is correct or true; clients (non-building and rebuilders) are still free to request as many buildinfo files that match a given binary hash (of the build output) as they want from a sig-repo, to verify the claim or to find independent parties that agree with it.)

For more background information on the Debian archive, see DebianRepository/Format.

Signed-buildinfo repository

TBD

Adapt a keyserver?

Developer

TBD

dput should be patched to upload a Buildinfo file to a sigrepo.

Rebuilders

TBD

This functionality could be added to reprotest, e.g. as a background service.

Non-building clients

TBD

One could configure apt to contact sigrepos and ask for a minimum number of independent signatures before installing packages.

Security considerations

Due to resource constraints - i.e. a design optimised for binary distribution but not signature distribution - archives are not expected to store buildinfo signatures by the original developer. Instead, they can re-sign a whole batch of buildinfo files at once, after doing basic sanity checks on them - e.g. to check that the developer isn't lying about them - and publish this. One may raise the point that this batch is redundant given the sig-repos, but actually they can help to avoid some attacks:

  • Fake buildinfo files presented as "from the archive". Yes, these would not be signed by the archive - but if the archive does not officially publish a signed version, there is no way to distinguish a legitimate one vs a fake one.

  • Sig-repos getting ?DoSd by buildinfo files for junk. They may instead filter only for buildinfo files that build a source package that was actually published by the archive. Of course, they MUST accept (subject to self-protection vs DoS) buildinfo files with binary hashes that contradict what the archive said - that is the whole point of reproducible-builds.

To prevent the archive framing them for generating a false or bad buildinfo file, developers MUST publish their own signed-buildinfos to a (or several) sig-repos. Developers must do this directly from their own machines, rather than relying on the archive to forward this - since the archive could just drop it, if they are being malicious. Again, yes nobody can forge the developer's signature on a buildinfo file, but if there is no signed version in public distribution, then there is no way to distinguish a legitimate one vs a fake one.

We do not yet attempt to define what sort of logic non-building clients should perform, in order to classify a "safe" vs an "unsafe" binary. This (a) does not affect the rest of our system, and (b) is a hard problem to solve, and would require more real-world data and research. The strictness of the policy will depend on the user's security needs.

  1. "One" means in the sense of who controls the keys; there might be a CDN or mirror network that duplicates the contents. (1)