The Current State of Communication
Currently, the DebTorrent program makes use of the HTTP retrieval method for communicating with APT. It implements almost a complete proxy for downloading files from HTTP mirrors. The only exceptions are, since it considers Packages files to be torrents, it notes when they are requested and starts the corresponding torrent running. Then, when DebTorrent receives a request for a package file (which it identifies by extension), it finds the appropriate torrent that contains that file and begins to download it using the DebTorrent protocol (i.e. not using HTTP). Once the download is complete, it passes the file on to APT as if it had been downloaded directly from the HTTP mirror.
The major problems with this method are:
BitTorrent downloads start off slowly, so downloading multiple packages sequentially means the complete download will occur very slowly
the user needs to be aware that downloading is happening (such as by a progress bar), which is not possible due to the non-sequential downloading of files that occurs in BitTorrent
To solve the first problem of slow startup of downloads, multiple packages need to be downloaded at once from the same torrent, without waiting for one to finish before starting another. This could be as simple as telling APT to pipeline multiple requests to DebTorrent, which would alleviate some of the problem.
The second problem is trickier, as APT will only be aware of when downloads begin and end. Pipelined downloads may help though, as there may be more activity of files starting and stopping so that the user will not notice so much. However, in BitTorrent downloads usually occur in such a way that all the files complete at near the same time at the end of the download. So, even with pipelined downloads, it may appear to the user that nothing is happening (at which point they may abort the download), until finally all will suddenly come in at the end.
The Proposed Method
Instead of HTTP, another protocol could be used by APT to request downloads of files, as well as other information from DebTorrent. This protocol would be indicated in the sources.list file by "debtorrent://...", or perhaps just "dt://...". A good candidate for this type of interaction would seem to be a form of [http://en.wikipedia.org/wiki/Remote_procedure_call RPC (Remote Procedure Call)]. The DebTorrent server would define several procedures for the APT client to call that would accomplish everything needed to solve the problems above.
Some example calls might be:
- int start_download(string pkg1, string pkg2, ...)
- which returns an ID used to refer to that download afterwards
- struct get_status(int ID)
- which returns a structure detailing the current status of the download (download speed, percent complete, files that have already completed downloading, ...)
- binary get_file(int ID, string pkg1)
- which returns the downloaded package from the torrent
and so on.
The [http://www.xmlrpc.com/spec XML-RPC protocol] is a good candidate for the communication protocol, as it is simple, well understood, operates over HTTP, and contains all the functionality needed for the proposed method to work. There is also a library available in the Python standard library for [http://docs.python.org/lib/node658.html creating and parsing XML-RPC strings from Python data].
Although the XML-RPC method does have the advantage of readability, it also involves a lot of overhead, both in transmitted bytes and parsing time. A more efficient method, which is also more familiar to BitTorrent users, would be to use [http://en.wikipedia.org/wiki/Bencode Bencoded data] instead of XML. Since all torrent information is stored bencoded, BitTorrent clients have all the functionality needed to bencode/bdecode all Python variable types. Of course, bencoded data is not as readable, and functionality would have to be added to APT to encode/decode it (though this is probably true of XML as well).