Ansgar Burchardt: multi-archive support for dak
Name: Ansgar Burchardt
Contact/Email: firstname.lastname@example.org, ansgar on OFTC
Background: I study mathematics at the University of Heidelberg, Germany. I have used Debian since about 2000, started contributing in 2008 and finally became a Developer in 2010. As I am also on the FTP team, I use dak almost daily and have also contributed several patches to it.
Exams and other commitments: None.
Other summer plans: DebConf12. No other plans that take several days.
Why Debian?: The same reasons I started contributing to Debian hold: I use it all the time and believe Debian plays an important role in making free software available (more so than other distributions due to the DFSG and social contract).
- Are you applying for other projects in SoC? No.
Project title: multi-archive support for dak
- Add support for running several software repositories from a single installation of Debian's archive software (dak).
Mentor: I asked Joerg Jaspert and he is okay with mentoring this project.
Benefits to Debian: see project details
As a operating system distribution Debian maintains a software archive with a large amount of packages: the main archive contains a total of over 500,000 binary packages for 14 architectures built from 30,000 source packages). To manage this archive Debian uses a custom software, the Debian archive kit (dak) which is written in Python with a PostgreSQL database backend and shell scripts to glue everything together.
dak provides the main archive on ftp.debian.org; Debian runs two additional installations for backports.debian.org and security.debian.org. In addition to the public view of the archive, dak provides a more often updated private view of the archive ("build queues") for the build network which automatically builds uploaded packages for all architectures Debian supports.
My proposal aims to add support for running multiple archives from a single dak installation. This could be used to run both ftp.debian.org and backports.debian.org from a single installation, reducing administrative overhead. Also it allows to handle build queues similar to the public archive view by using a (private) regular archive instead.
In addition having multi-archive support in dak would make it much easier to implement some additional features, most notably "personal" archives areas similar to Ubuntu's PPA.
Needed changes to dak
dak currently assumes all files reside in a single storage area (archive). It needs to be changed to a, store and look for files in a different area for each archive, b, keep track in which archives a given file is present and where it needs to be removed, and c, copy files around if they are needed in an extra archive.
The last part can be implemented later as it will not be needed at least for replacing buildd queues. It would be useful for merging backports.d.o, but I believe it's not strictly needed there either. So let's address the other points first:
First dak needs to know which archive a suite belongs to. While there is an archive table in the database backend, it is only referenced (indirectly) by the files table. This needs to be changed to a more direct suite->archive mapping.
Tools generating or using files below dists/ can then be changed to do so relative to the archive root.
Then the files->location->archive mapping needs to be dropped as a single file may later be available in multiple archives. This is a very large change and will require changes in most tools that make up dak.
Once this is done, tools creating files inside of the pool/ areas can do so relative to the archive root as well (not all need to do so for now).
With these changes implemented the core of dak should be ready for multi-archive support.
Now the current build queue management can be replaced by a (private) archive area that uploaded packages get also installed in (same as current build queues, but using the same code as the main archive). This will allow to finally drop apt-ftparchive.
I believe the project would be a success if I get this far, but some things remain to be done:
All tools creating files in pool/ need to do so relative to the archive root.
dak process-upload should look for additional files (like an .orig.tar.gz) in all archives and copy the file if needed. This would allow uploads to a (merged) backports.d.o that do not include the original source tarball. Note that even before this an .orig.tar.gz included in a later upload will be required to one already in the database, even if the upload goes the a different archive (the same goes for binary packages).
1. Adapt database and tools to handle multiple archives.
- This will require changes in many places that access the filesystem and some changes to the database.
2. a. Change build queues to use same tools as the regular archive (clean-suites, dominate, generate-packages-sources2, ...)
- b. Make it possible to merge backports.debian.org with ftp-master.debian.org.
Both should be possible to implement with the groundwork from the first phase.
- May, 7th-13th: create a small test installation of dak, get minimal dinstall working (disable mirror pushes etc.)
- May 14th-27th: first round of database changes:
- reference archive table in a more useful way, introduce path to archive root, adapt tools that create files in dists/ to do so relative to archive root
- May 28th-June 24th: second round of database changes:
- keep track which archives a file belongs to, adapt tools that create or look for files in pool/ for this change, dak clean-suites might need some larger changes. I expect this to be the part needing the most amount of work.
- June 25th-July 22nd: replace old build queues:
- replace build queues by a private archive.
Note: I want to finish this in about two weeks, but plan to attend DebConf from July 1st to 14th where I will likely work less on this project.
- replace build queues by a private archive.
- July 23rd-August 5th: adapt remaining tools that create files in pool/ to do so relative to the archive root