A project proposal for the Summer of Code 2006

General Idea


The Debian package management frontend apt uses mainly HTTP to download packages for installation. Unless otherwise configured, it uses Debian's main mirror ftp.debian.org as package repository. Through the sheer mass of users this leads to a bandwidth saturation on the affected host[1]. We propose to provide higher download speed to package downloaders by using a GeoIP[2] database to redirect clients to a mirror close to them and better distribute the load over the existing mirror network. In addition to that, the redirection statistics can be used to pinpoint places where more mirrors are needed.

Background Information


The Debian project runs a network of mirror hosts to distribute its components and packages. Some of these mirrors are updated employing a push mechanism[3], others are responsible for staying synchronized with the master mirror by themselves. All mirrors are listed in a mirror database[4].

Too many users rely on the default entry http://ftp.debian.org/ in /etc/apt/sources.list. This exposes this host to an unneccarily high number of requests which could just as well be serviced by regional Debian archive mirrors.

Objectives


The intention of this project is to develop a redirection mechanism, which looks up the location of the requesting IP address to refer the client to a mirror situated geographically closer. This regional mirror is chosen from the Debian mirror list based on e. g. GeoIP data.

Since current versions of the apt-clients do not support HTTP redirection, the new method must maintain backwards compatibility to allow for a gradual adoption of improved clients.

In case the requested file is unavailable on a regional mirror or the referred-to mirror is temporarily non-functional, the apt-client is supposed to notify the master mirror via HTTP headers and be in turn redirected to another regional mirror or receive the desired file from the master mirror itself.

Finally, redirection data should be used to judge the reliability of any given mirror and allow the master mirror administrators to locate bottlenecks in the Debian archive mirror network.