Ansgar Burchardt: multi-archive support for dak

Project details

As a operating system distribution Debian maintains a software archive with a large amount of packages: the main archive contains a total of over 500,000 binary packages for 14 architectures built from 30,000 source packages). To manage this archive Debian uses a custom software, the Debian archive kit (dak) which is written in Python with a PostgreSQL database backend and shell scripts to glue everything together.

dak provides the main archive on ftp.debian.org; Debian runs two additional installations for backports.debian.org and security.debian.org. In addition to the public view of the archive, dak provides a more often updated private view of the archive ("build queues") for the build network which automatically builds uploaded packages for all architectures Debian supports.

My proposal aims to add support for running multiple archives from a single dak installation. This could be used to run both ftp.debian.org and backports.debian.org from a single installation, reducing administrative overhead. Also it allows to handle build queues similar to the public archive view by using a (private) regular archive instead.

In addition having multi-archive support in dak would make it much easier to implement some additional features, most notably "personal" archives areas similar to Ubuntu's PPA.

Needed changes to dak

dak currently assumes all files reside in a single storage area (archive). It needs to be changed to a, store and look for files in a different area for each archive, b, keep track in which archives a given file is present and where it needs to be removed, and c, copy files around if they are needed in an extra archive.

The last part can be implemented later as it will not be needed at least for replacing buildd queues. It would be useful for merging backports.d.o, but I believe it's not strictly needed there either. So let's address the other points first:

First dak needs to know which archive a suite belongs to. While there is an archive table in the database backend, it is only referenced (indirectly) by the files table. This needs to be changed to a more direct suite->archive mapping.

Tools generating or using files below dists/ can then be changed to do so relative to the archive root.

Then the files->location->archive mapping needs to be dropped as a single file may later be available in multiple archives. This is a very large change and will require changes in most tools that make up dak.

Once this is done, tools creating files inside of the pool/ areas can do so relative to the archive root as well (not all need to do so for now).

With these changes implemented the core of dak should be ready for multi-archive support.

Now the current build queue management can be replaced by a (private) archive area that uploaded packages get also installed in (same as current build queues, but using the same code as the main archive). This will allow to finally drop apt-ftparchive.

I believe the project would be a success if I get this far, but some things remain to be done:

All tools creating files in pool/ need to do so relative to the archive root.

dak process-upload should look for additional files (like an .orig.tar.gz) in all archives and copy the file if needed. This would allow uploads to a (merged) backports.d.o that do not include the original source tarball. Note that even before this an .orig.tar.gz included in a later upload will be required to one already in the database, even if the upload goes the a different archive (the same goes for binary packages).

Deliverables

1. Adapt database and tools to handle multiple archives.

2. a. Change build queues to use same tools as the regular archive (clean-suites, dominate, generate-packages-sources2, ...)

Both should be possible to implement with the groundwork from the first phase.

Project schedule

Progress reports