⇤ ← Revision 1 as of 2007-05-02 22:09:55
Size: 110
Comment: Initial creation
|
Size: 6379
Comment: First draft without implications
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
This page will be used to describe the proposed changes to apt-ftparchive to support the DebTorrent program. | This page describes the proposed changes to apt-ftparchive to support the DebTorrent program. == Packages files == ==== Motivation ==== Packages files are currently used for DebTorrent metadata, since they contain all the necessary information: the file path, the file size, and the SHA1 sum of the file. However, this means that all files must be a single piece, as each piece needs an SHA1 sum generated for it. Though this does work for DebTorrent, the sharing of pieces is not as effective as it should be, since very large files (up to 160MB) will need to be downloaded in their entirety before they can be shared with others. A more effective sharing can be obtained by breaking up the files larger than a threshold into smaller pieces, and generate the SHA1 sum for the pieces and add it to the Packages file. ==== Proposal ==== Break up files larger than a given threshold into multiple pieces, and generate an SHA1 sum for the pieces that is then included in the Packages file. This would mean adding a new field to the Packages file, "SHA1-pieces", which would be a list of SHA1 sums and piece sizes, with one piece per line. This new field would only be present in packages that have a file larger than a predetermined threshold. An example, using a piece size and threshold of 1 MB is below (some of the normal fields present in Packages files have been omitted for brevity). {{{ Package: foo Size: 5341873 SHA1: 38170c08cb458fd4879c34b6f608294c50312bbb SHA1-pieces: e5fa44f2b31c1fb553b6021e7360d07d5d91ff5e 1048576 7448d8798a4380162d4b56f9b452e2f6f9e24e7a 1048576 a3db5c13ff90a36963278c6a39e4ee3c22e2a436 1048576 9c6b057a2b9d96a4067a749ee3b3b0158d390cf1 1048576 5d9474c0309b7ca09a182d888f73b37a8fe1362c 1048576 ccf271b7830882da1791852baeca1737fcbe4b90 98993 Package: bar Size: 72856 SHA1: 9425fa8de16f6283365f6bee87f405da16a203e6 Package: ... }}} In theory, the piece size and threshold could be chosen to be any power of 2, but in practice values of 256 KB, 512 KB, and 1 MB are reasonable values. The smaller it is set at, the more effective the file sharing will be, but the more information will need to be added to Packages files and the more pieces will need to be tracked by DebTorrent (memory issue). ==== Implications ==== == Release files == ==== Motivation ==== In addition to the information above in the Packages files, it would be useful to have some information stored in the Release file. The best example of this is the URL for the DebTorrent tracker, which all downloaders will need to contact to begin the download. Though this information could be left to a default value, adding it here allows any person running a debian-style archive to specify their own tracker here, to both monitor their own downloaders and reduce the load on the official (default) debian tracker. The Packages files themselves could also be downloaded through DebTorrent, further minimizing the load on mirrors. This is possible with the current information in the Release file, by treating each Packages file as a single piece with the SHA1 sum given in the Packages file. This would again suffer from the same ineffective downloading mentioned above for large files in the archive, as some of the Packages files are quite large (20 MB). Again, these large files could be broken down into multiple pieces, and SHA1 sums generated for the pieces, to make this downloading more effective. ==== Proposal ==== Add a new field to the Release file that specifies the tracker URL for the archive. {{{ Origin: Debian Label: Debian Suite: unstable Codename: sid Tracker: http://dttracker.debian.net:6969/announce Date: Tue, 01 May 2007 20:13:19 UTC Architectures: alpha amd64 arm hppa hurd-i386 i386 ia64 m68k mips mipsel powerpc s390 sparc Components: main contrib non-free Description: Debian Unstable - Not Released }}} The second part could be accomplished in 2 ways: All in the Release file:: Add information to the Release file for the large Packages files broken up into multiple pieces, each with an SHA1 sum. This would be similar to the "SHA1-pieces" field added to the Packages files for large files. As SHA1 sums for Packages files are currently listed on a single line per file, this would probably mean adding a new section at the end of the Release file, where each Packages file larger than the threshold would have the following entries (for 1 MB pieces). {{{ Path: main/binary-alpha/Packages Size: 20497160 SHA1-pieces: e5fa44f2b31c1fb553b6021e7360d07d5d91ff5e 1048576 7448d8798a4380162d4b56f9b452e2f6f9e24e7a 1048576 a3db5c13ff90a36963278c6a39e4ee3c22e2a436 1048576 ... 9c6b057a2b9d96a4067a749ee3b3b0158d390cf1 1048576 5d9474c0309b7ca09a182d888f73b37a8fe1362c 1048576 ccf271b7830882da1791852baeca1737fcbe4b90 574216 Path: main/binary-alpha/Packages.gz Size: 5969270 SHA1-pieces: 9425fa8de16f6283365f6bee87f405da16a203e6 1048576 ... }}} This has the advantage of being simpler to manage and read, and possibly simpler to implement, as all the information is stored in a single location. However, it will add a lot of size to the Release file, which would then need to be downloaded by everyone accessing the archive. Separate files:: The information for Packages files broken up into pieces could be stored separately in a file for each Packages file. The file would contain the same information as the Release file in the previous proposal. For eaxmple, the main/binary-alpha/Packages file would have alongside it a main/binary-alpha/Packages.pieces file containing (for 1 MB pieces) {{{ SHA1-pieces: e5fa44f2b31c1fb553b6021e7360d07d5d91ff5e 1048576 7448d8798a4380162d4b56f9b452e2f6f9e24e7a 1048576 a3db5c13ff90a36963278c6a39e4ee3c22e2a436 1048576 ... 9c6b057a2b9d96a4067a749ee3b3b0158d390cf1 1048576 5d9474c0309b7ca09a182d888f73b37a8fe1362c 1048576 ccf271b7830882da1791852baeca1737fcbe4b90 574216 }}} This adds the complexity of many additional files, but also some flexibility in terms of adding additional features in the future. The size added per person accessing the archive is also minimized, as they will not download additional information on architectures they aren't interested in, nor will downloaders not using DebTorrent need to download this information at all. ==== Implications ==== |
This page describes the proposed changes to apt-ftparchive to support the DebTorrent program.
Packages files
Motivation
Packages files are currently used for DebTorrent metadata, since they contain all the necessary information: the file path, the file size, and the SHA1 sum of the file. However, this means that all files must be a single piece, as each piece needs an SHA1 sum generated for it. Though this does work for DebTorrent, the sharing of pieces is not as effective as it should be, since very large files (up to 160MB) will need to be downloaded in their entirety before they can be shared with others. A more effective sharing can be obtained by breaking up the files larger than a threshold into smaller pieces, and generate the SHA1 sum for the pieces and add it to the Packages file.
Proposal
Break up files larger than a given threshold into multiple pieces, and generate an SHA1 sum for the pieces that is then included in the Packages file. This would mean adding a new field to the Packages file, "SHA1-pieces", which would be a list of SHA1 sums and piece sizes, with one piece per line. This new field would only be present in packages that have a file larger than a predetermined threshold. An example, using a piece size and threshold of 1 MB is below (some of the normal fields present in Packages files have been omitted for brevity).
Package: foo Size: 5341873 SHA1: 38170c08cb458fd4879c34b6f608294c50312bbb SHA1-pieces: e5fa44f2b31c1fb553b6021e7360d07d5d91ff5e 1048576 7448d8798a4380162d4b56f9b452e2f6f9e24e7a 1048576 a3db5c13ff90a36963278c6a39e4ee3c22e2a436 1048576 9c6b057a2b9d96a4067a749ee3b3b0158d390cf1 1048576 5d9474c0309b7ca09a182d888f73b37a8fe1362c 1048576 ccf271b7830882da1791852baeca1737fcbe4b90 98993 Package: bar Size: 72856 SHA1: 9425fa8de16f6283365f6bee87f405da16a203e6 Package: ...
In theory, the piece size and threshold could be chosen to be any power of 2, but in practice values of 256 KB, 512 KB, and 1 MB are reasonable values. The smaller it is set at, the more effective the file sharing will be, but the more information will need to be added to Packages files and the more pieces will need to be tracked by DebTorrent (memory issue).
Implications
Release files
Motivation
In addition to the information above in the Packages files, it would be useful to have some information stored in the Release file. The best example of this is the URL for the DebTorrent tracker, which all downloaders will need to contact to begin the download. Though this information could be left to a default value, adding it here allows any person running a debian-style archive to specify their own tracker here, to both monitor their own downloaders and reduce the load on the official (default) debian tracker.
The Packages files themselves could also be downloaded through DebTorrent, further minimizing the load on mirrors. This is possible with the current information in the Release file, by treating each Packages file as a single piece with the SHA1 sum given in the Packages file. This would again suffer from the same ineffective downloading mentioned above for large files in the archive, as some of the Packages files are quite large (20 MB). Again, these large files could be broken down into multiple pieces, and SHA1 sums generated for the pieces, to make this downloading more effective.
Proposal
Add a new field to the Release file that specifies the tracker URL for the archive.
Origin: Debian Label: Debian Suite: unstable Codename: sid Tracker: http://dttracker.debian.net:6969/announce Date: Tue, 01 May 2007 20:13:19 UTC Architectures: alpha amd64 arm hppa hurd-i386 i386 ia64 m68k mips mipsel powerpc s390 sparc Components: main contrib non-free Description: Debian Unstable - Not Released
The second part could be accomplished in 2 ways:
- All in the Release file
- Add information to the Release file for the large Packages files broken up into multiple pieces, each with an SHA1 sum. This would be similar to the "SHA1-pieces" field added to the Packages files for large files. As SHA1 sums for Packages files are currently listed on a single line per file, this would probably mean adding a new section at the end of the Release file, where each Packages file larger than the threshold would have the following entries (for 1 MB pieces).
Path: main/binary-alpha/Packages Size: 20497160 SHA1-pieces: e5fa44f2b31c1fb553b6021e7360d07d5d91ff5e 1048576 7448d8798a4380162d4b56f9b452e2f6f9e24e7a 1048576 a3db5c13ff90a36963278c6a39e4ee3c22e2a436 1048576 ... 9c6b057a2b9d96a4067a749ee3b3b0158d390cf1 1048576 5d9474c0309b7ca09a182d888f73b37a8fe1362c 1048576 ccf271b7830882da1791852baeca1737fcbe4b90 574216 Path: main/binary-alpha/Packages.gz Size: 5969270 SHA1-pieces: 9425fa8de16f6283365f6bee87f405da16a203e6 1048576 ...
This has the advantage of being simpler to manage and read, and possibly simpler to implement, as all the information is stored in a single location. However, it will add a lot of size to the Release file, which would then need to be downloaded by everyone accessing the archive. - Separate files
- The information for Packages files broken up into pieces could be stored separately in a file for each Packages file. The file would contain the same information as the Release file in the previous proposal. For eaxmple, the main/binary-alpha/Packages file would have alongside it a main/binary-alpha/Packages.pieces file containing (for 1 MB pieces)
SHA1-pieces: e5fa44f2b31c1fb553b6021e7360d07d5d91ff5e 1048576 7448d8798a4380162d4b56f9b452e2f6f9e24e7a 1048576 a3db5c13ff90a36963278c6a39e4ee3c22e2a436 1048576 ... 9c6b057a2b9d96a4067a749ee3b3b0158d390cf1 1048576 5d9474c0309b7ca09a182d888f73b37a8fe1362c 1048576 ccf271b7830882da1791852baeca1737fcbe4b90 574216
This adds the complexity of many additional files, but also some flexibility in terms of adding additional features in the future. The size added per person accessing the archive is also minimized, as they will not download additional information on architectures they aren't interested in, nor will downloaders not using DebTorrent need to download this information at all.