This page describes a method used by DebTorrent for solving one of the problems of using a bittorrent-like program to serve a constantly updating archive of files, like the testing and unstable suites of the debian archive. These suites are currently updated twice daily, with plans to update more frequently in the future.
The problem this creates is that a traditional torrent created for files in the archive is identified by an infohash (an SHA1 hash of the file names, sizes and SHA1 hashes of the files). If a single file in the torrent changes, or a new one is added, or an old one removed, the infohash will change, which will create a new torrent and swarm of peers. The peers connected to this torrent will not be able to connect to the peers from the previous torrent, even though they have 99% of the files in common. This will fracture the downloading population into many torrents of very small size, reducing the efficiency of the download (bittorrent is most efficient when the swarm of peers is very large).
To solve this problem requires some fundamental changes to the way bittorrent operates: This was implemented in version 0.1.6, and the current DebTorrent works this way.
- the torrent identifier (infohash) will need to be changed to something that exists for a longer period
- it can still be a hash, but not over the file names and file hashes
- files in the torrent need to be given unique piece numbers that never change or are reused, even if the file is removed from the archive
- new files will be given new piece numbers at the end of the torrent (i.e. the largest numbers)
- the piece numbers of removed files need to be kept and not reused for new files
- eventually the piece numbers assigned will become too large for the number of files in the torrent, and a fresh torrent will need to be generated
- peers need to be able to understand which pieces are where in a torrent
- torrents were previously constructed only from Packages files, which don't contain piece ordering information
- a new file containing information on how the pieces are ordered will be needed
- peers need to be able to communicate with other peers that have a different idea of what files are in the torrent
- they need to drop any have/bitfield information for pieces they don't know about
- they need to assume a neighbor doesn't have pieces when the bitfield is short
The following sections contain more information on some of these changes.
Creating the Torrent
There are several possibilities for choosing the information to hash to create the torrent identifier (infohash). It should be specific to the torrent, and so should contain information about the files in it. Examples are, the fields from the Release file, such as origin, label, codename (sid), version, and suite (unstable); also, the component (contrib) and architecture (amd64) are probably good choices. Additionally, it should be specific to this generation of the infohash, so the date and time of the generation should be included in the hash, as well as the maximum piece size used.
The unique piece numbers will first be assigned in the traditional bittorrent fashion when a new torrent is generated. The files will be sorted in order by their full path name (which is different than the order in the Packages file), and assigned piece numbers sequentially from 0 (zero). On subsequent modification of the files contained in the torrent, new files will be given piece numbers starting with the largest piece number already in the torrent. The new files will again be ordered by their full path name for this assignment, and given new piece numbers in that order. Files that are removed will have their piece numbers kept in reserve, resulting in gaps in the piece numbering of clients using the new torrent. The clients will know where these gaps are, but not the file names that go in them, nor the file's hashes. Modifying files without changing the file name is not allowed.
A good strategy for when to restart the unique piece numbering is necessary. Otherwise the piece numbers would grow forever until a lot of communication would be wasted sending bitfields that are mostly empty, and memory is wasted storing lists that are mostly empty. The strategy could be one of the following:
- when the maximum piece number gets too large (i.e. 100,000)
- only makes sense for large suites (like main), not smaller ones (like contrib/non-free)
- when a certain amount of time has passed (i.e. every month)
- fairly arbitrary
- when a project milestone is reached (i.e. when a new suite is released)
- relies on regular milestones that are not too far apart
- when the number of empty pieces gets too large (i.e. half the torrent is empty)
Probably the last strategy is the best, as it does not rely on any other factor than the amount of wastage in the torrent. See the discussion below for a more in depth analysis of when to restart piece numbering.
New Torrent Files
A new file will need to be distributed to peers to supplement the information available in Packages files to create this new torrent. These files (one per torrent) will contain the information needed by a DebTorrent client to identify the torrent to others, and to order the pieces appropriately for the download. This information will be in the RFC 2822 header field format, as used by the Release files. It will begin with the fields shown below:
Torrent: the 40 byte hex representation of the SHA1 torrent identifier for this torrent
OriginalDate: the date from the Release file when this torrent was originally created
Date: the date from the Release file used to create this version of the torrent
PieceSize: the maximum number of bytes in a single piece (defaults to 512 KiB)
NextPiece: the next piece index to use for adding a new file to the torrent (starting from 0)
OriginalPieces: the number of pieces that were in the torrent when it was originally created
Codename: copied from the Release file
Suite: copied from the Release file
Component: the component (main/contrib/non-free) this torrent is for
Architecture: the specific architecture this torrent is for (may be 'all', see below)
Tracker: the announce address of the tracker to contact to get peers for this torrent
TorrentHashFields: a space separated list of the header fields above to hash to create the torrent identifier (defaults to 'Codename Suite Component Architecture PieceSize OriginalDate')
The last entries in the torrent file will be a list of the file name and starting piece numbers for the files. This will have a field name of PieceNumbers, and will consist of one file per line, listing first the starting piece number (right justified) followed by the file name.
Here is an example of what the complete torrent file should look like:
Torrent: 1a8d7c18b3b5d89fedd0b113f251b6c1002b8710 OriginalDate: Sun, 08 Apr 2007 10:41:22 UTC Date: Tue, 07 Aug 2007 22:53:09 UTC PieceSize: 524288 NextPiece: 2486 OriginalPieces: 2460 Codename: sid Suite: unstable Component: contrib Architecture: amd64 Tracker: http://dttracker.debian.net:6969/announce TorrentHashFields: Codename Suite Component Architecture PieceSize OriginalDate PieceNumbers: 0 pool/contrib/a/acx100/acx100-source_20060521-3_amd64.deb 1 pool/contrib/a/alien-arena/alien-arena_6.05-1_amd64.deb 10 pool/contrib/a/alien-arena/alien-arena-dbg_6.05-1_amd64.deb 11 pool/contrib/a/alien-arena/alien-arena-server_6.05-1_amd64.deb 25 pool/contrib/a/alien-arena/alien-arena-server-dbg_6.05-1_amd64.deb 36 pool/contrib/a/amoeba/amoeba_1.1-19_amd64.deb 39 pool/contrib/a/arbortext-catalog/arbortext-catalog_1.01-3.3_amd64.deb 41 pool/contrib/a/argouml/argouml-doc_0.19.6-2.1_amd64.deb 57 pool/contrib/a/aspectj/aspectj_1.1.1-2_amd64.deb ... 2456 pool/contrib/x/xmmplayer/xmms-xmmplayer_0.3.3-1+b1_amd64.deb 2459 pool/contrib/x/xserver-xorg/xserver-xorg_0.10.6-1_amd64.deb 2460 pool/contrib/a/arbortext-catalog/xt-catalog_1.01-3.3_amd64.deb 2476 pool/contrib/x/xtrs/xtrs_4.9c-1_amd64.deb 2477 pool/contrib/y/ydpdict/ydpdict_0.66-1_amd64.deb 2478 pool/contrib/a/alsa-tools/alsa-firmware-loaders_1.0.14-1_amd64.deb 2485 pool/contrib/a/argouml/argouml_0.19.6-2.1_amd64.deb
The actual torrent files for the testing (lenny) and unstable (sid) suites are available online, as well as a status page showing the growing size of the number of pieces.
Currently, the limitations of having these files kept separately from the mirrors and the Packages files that reference them requires some special handling. Since a user might have obtained a Packages file from a slightly out-of-date mirror, but then got the most current torrent file, the unique piece numberings need to be kept for a time until all mirrors can be sure to be lined up. Currently old references to unique piece numbers are kept until the torrent is regenerated, which allows out-of-date mirror users to extract the old information they need, and all users ignore piece numbers they aren't interested in. The restarting of a torrent will cause trouble though for out-of-date mirror users if they try to download a file that is no longer in the up-to-date archive. This file will not be in their torrent, since it is no longer listed in the unique piece numbering file, so trying to install it will result in a 'file not found' error. Hopefully in the future, this data will be kept with (or in) the Packages files, so as not to cause any problems (see 'Adding this information to the Packages/Release files' below).
Architecture:all packages
Since architecture:all packages are used by everybody, it is important to keep all the downloaders of them together in a single torrent, rather than splitting them up into the individual architecture torrents. Therefore, an extra torrent of only architecture:all packages will be created and added to, just as the other architecture torrents are. The architecture:all packages will then not be listed in the individual architecture torrents (i,e. i386). Creating the 'all' torrent is currently quite simple, since all architectures contain exactly the same architecture:all packages, the torrent can be created identically for each of them. However, this behavior can not be depended on in the future, so a better strategy is needed.
The architecture:all torrent will therefore be created from all of the other architecture's Packages files. It will contain an entry for each architecture:all file found in any of the Packages files. Thus, if an architecture:all file exists only in the i386 Packages file, clients downloading the 'all' torrent for other architectures will simply ignore this entry, and so the file will be in a piece gap just like removed files are. When new architecture:all files are added to the archive, these files are added to the end of the 'all' torrent just as for the others, the only exception being that all the new architecture:all files added to all the architectures will be added to the same 'all' torrent file.
Adding this information to the Packages/Release files
Eventually, it would be favorable to include a lot of the information in the torrent file in the standard Packages and Release files to avoid duplication and avoid possible errors in retrieving the data.
The largest part of the torrent file, each file's starting piece number, would be easily included in the Packages file using a single StartingPiece field. This would save a lot of storage space, as the largest part of the torrent file is mostly a duplication of the file names. This would also work for the architecture:all packages, except that their numbering would be separate from the architecture-specific files.
The remainder of the torrent data could be mostly included in the main Release file, including a multi-line field indicating the torrent identifier for each torrent. Alternatively, the torrent data could be put into the architecture specific torrent files (e.g. debian/dists/sid/main/binary-alpha/Release), as those files don't seem to be used for anything anyway. (Then a new Release file would be needed for the 'all' torrent.)
Peer Protocol
Implementing unique piece numbers means that a single torrent swarm will contain peers who have a different idea of how many pieces there are in the complete torrent. However, they both know exactly where the piece numbers that they do have in common belong in the torrent, even if one of them doesn't know exactly what file they belong to (but that's OK, since that peer doesn't have, nor want to download the unknown file anyway). It's important that peers do not drop these connections unnecessarily, which requires some modifications to the communication protocol between peers.
- peers no longer drop connections to peers that specify a bitfield of the wrong length
- peers that receive a shorter than expected bitfield will assume all missing bits are 0
- peers that receive a longer than expected bitfield will disregard the extra information
- peers no longer drop connections to peers that specify that they have an unknown piece
- peers that receive a HAVE message for a piece they aren't aware of will disregard it
The rest of the protocol should be fine for these peers. A peer will never advertise that it has a piece of a file it doesn't know about, so it will never get a request to upload that piece (if it does, the connection should be dropped).
Compressed bitfields
Considering the already large, and now growing size of the bitfield, it may be beneficial to compress it before sending it to another peer to minimize the bandwidth. This would involve adding a new COMPRESSED-BITFIELD message, that would be used by a peer when sending a bitfield longer than 1000. The compressed bitfield would be created using the gzip compression module, and should result in a large savings since the bitfield will be mostly filled with 0's.
Discussion
Restarting the unique piece numbering
An analysis of the increasing number of pieces used by this strategy has been performed on data from the Debian archive's testing and unstable suites for 3 months, from May 9th to August 7th, 2007. This data is similar to that shown here for the unstable suite, but contains more information as the number of new pieces can not be directly determined from the total size of the new files.
Starting the unique piece numbering strategy on May 9th, and adding pieces to the end of the torrents for these 3 months, would result in torrents with the number of pieces shown in the following 2 graphs (main = black, contrib = blue, non-free = red, architecture-specific = solid, architecture:all = dashed):
(Note: the reason for the lack of new pieces during May was a pending transition of glibc which occurred on June 1st.)
(Note: the one lonely solid black line is for main-hurd-i386, which has half as many pieces as the other architectures.)
The proposed restarting of piece numbering strategy is to wait until the torrent is half empty. Or, rather that the number of pieces in the torrent has approximately doubled from what it started at. To better understand the effects of this strategy, it is more informative to observe the percentage increase in the number of pieces for each torrent, which will prevent the large number of pieces in main from overwhelming the smaller torrents such as contrib/non-free. Here are the same 2 graphs, shown as a percentage increase (main = black, contrib = blue, non-free = red, architecture-specific = solid, architecture:all = dashed):
The large increase in one of the contrib (red) torrents is from the addition of a large package to the non-free s390 architecture, which more than doubled the size of the torrent. This shows the drawback of this approximation for the wasted number of pieces, since these pieces are not actually wasted, as they were never in the torrent to begin with and so left behind no gap.
Other than that discrepancy, the plan of restarting the piece numbering once a torrent reaches twice it's original size seems sound. For unstable, this would mean a torrent would last almost 2 months, while a testing torrent would probably last about twice as long. In both cases the total number of pieces would max out at about 60,000, though this may change if the archive size grows considerably.
The elimination of seeds
An important difference between this proposal and the standard bittorrent, is that there is no longer a possibility to be a seed. This is because, once a torrent has a piece gap, and new pieces added to it, it is no longer possible for a single peer to have all the pieces. This is probably not an issue though, since seeding is very difficult in a DebTorrent system already, due to the very large size of the archive, and the lack of a peer's interest in all of the packages. The only peers who would have all the latest pieces would be mirrors, but even they would lack any old pieces that have been removed.
Bittorrent only depends on seeding for the initial seed to upload pieces into the system. Since DebTorrent falls back to a backup HTTP download from a mirror, this initial seed requirement is no longer an issue, as the first person to request an updated package will become the seed for it in the system.
There are some optimizations, especially in the tracker, that will have to be rethought because of this. For example, the tracker usually separates seeds and leechers so as to not return other seeds to a seed when it requests peers to connect to.
Not your problem (Packages.gz rather than individual archives)
The issue of packages changing rapidly is an inherent problem that is common to all download methods - http, ftp, rsync, debtorrent, you name it. It is therefore "not your problem to solve". Debian has already recognised the seriousness of the issue, and has created rred, and a system for downloading package diffs, generated daily, for the testing, unstable and experimental archives.
The bottom line is this: attempting to solve this issue, which is common to all download methods, is not adviseable at this time - it's far too complicated an optimisation, and it is an optimisation, when there are more important issues to solve.
In my opinion, this is the most important issue to solve for DebTorrent right now. If you go to the tracker page, you will see that there are over 500 peers, but they are distributed into over 300 individual torrents, so that most of the torrents only have a single peer. That means they aren't downloading anything from peers, but all from mirrrors, so DebTorrent is saving the mirrors nothing. There are a few torrents with around 10 peers, which are the people currently tracking unstable/testing and updating every day (so they all have the latest torrent).
Unique piece numbers will make rapidly changing torrents like unstable and testing last for much longer (a month or 2), so the number of peers will be much higher. This has nothing to do with rred or package diffs (which are actually Packages diffs, that is they aren't for packages but for the Packages.gz files that list all the packages), which only solve the problem of getting an updated Packages.gz file on a daily basis. --CameronDale