Translation(s): English - Italiano


This page is intended to collect information, ideas and comments related to adding BitTorrent functionality to the downloading of package files by Apt.

Other pages:

Motivation

The benefits of this project are clear, both to the Debian project and its mirrors, as well as any other developer wanting to host a popular archive but concerned with bandwidth costs. Upon completion and widescale deployment of the service, the bandwidth and hardware costs of providing a very large Debian archive to hundreds of thousands of users will be dramatically reduced.

These costs are currently being reduced by the voluntary mirroring system Debian uses to help distribute packages. This system comes with some drawbacks though, especially as the size of the archive grows. Some mirrors are already feeling the burden of the size, which has led to the introduction of partial mirroring. It also creates multiple central points of failure for users, as most depend on a single mirror, and does a poor job of evenly distributing the load, as some mirrors may be chosen more often by users. Finally, non-primary mirrors may be slow to update to new versions of packages, and changes happen slowly as sources' lists must be updated manually by users.

However, using a BitTorrent-like system, these voluntary mirrors could simply join the swarm of peers for the archive: mirroring only as much data and contributing only as much bandwidth as they can, providing multiple redundancies and being automatically load balanced by the system, and using the bandwidth savings to update their packages more frequently. This will further allow for future growth, both in the size of the archive and in the popularity of Debian.

This was originally proposed and accepted as a Google Summer of Code project in 2006. Unfortunately, the project failed due to time restrictions and the complexity of the project, but some good ideas came out of it.

A similar project was accepted for the 2007 Google Summer of Code, and the resulting code can be found here

A program that is similar in idea to this one, apt-torrent, has been created by Arnaud Kyheng and is avaliable online: http://sianka.free.fr/. apt-torrent is different in that it doesn't make any changes to the BitTorrent protocol, but rather it provides a wrapper around the BitTornado package which it then uses to download the package files.

Another derived BitTorrent protocol is gittorrent. Certainly people have talked about ways of using gittorrent to achieve the same thing as this proposal suggests - but it's probably a good idea that both efforts carry on in parallel.

Problems

Though the idea of implementing a BitTorrent-like solution to package distribution seems good, there are some problems with the current way that BitTorrent distributes files that make it unsuitable for a Debian archive. These limitations of the current BitTorrent systems will mostly require modifications to the client and protocol to improve. The 2006 Google SoC project identified some of these concerns, and more have been added since (if you have a new problem, add it here).

The Protocol

Implementation Details

The Solution

Here are some details on how the problems listed above are solved in the current implementation of the DebTorrent program.

Protocol Solutions

  • a lot of packages are too small:

    • pieces could be made a variable size, so that a piece could be for a single small package, even if the package size is very small.
  • there are too many packages to create individual torrents for each and the archive is too large to track efficiently as a single torrent:

    • create a torrent for each Packages file, or 2 if the Architecture:all packages are split out.

    • create a torrent representing subdirectories: map the concept of a hierarchical filesystem on top of torrents.
  • all downloaders of a package need to be aware of all other downloaders:

    • keep torrent boundaries consistent so that users in a torrent are only interested in users in the same torrent
  • the archive is frequently updated:

    • This issue is common to all update methods - ftp, http, rsync, debtorrent, and is being solved in a manner common to all update methods, by debian. Package update diffs are now generated on a daily basis. -- lkml
      • The frequently updated archive problem that debtorrent has is not the same. Since all files must be gathered together into a large torrent file, every time the archive is updated so will the torrent. This is not ideal. Updates to Packages.gz files with diffs are an unrelated issue. -- CameronDale

Implementation Solutions

Full Proposal

Here are the details of how the DebTorrent program solves the problems above.

Implementation Steps

The initial implementation steps proceeded in this order:

  1. Version 0.1.0: Implement a single variable size piece for each package and add parsing of current Packages files so they can be used unmodified as torrent files.
  2. Version 0.1.1: Use data from dpkg-status to determine which packages/pieces to download.
  3. Version 0.1.2: Implement proxy functionality to receive input from apt about which packages to download, break up large packages into multiple pieces, and download rare pieces using a backup HTTP downloader.
  4. Version 0.1.3: Automate the process to run as a daemon started from /etc/init.d, and package the program as a debian binary package.

Future steps could then be:

Pros

Cons