Mentor: Ganneff / FTPMaster
Summary: Design and implement a Smart Upload Server for Debian package uploads
- Python programming
Currently, Debian employs an FTP based upload solution known as the "Upload Queue" to handle uploads and a queue daemon, debianqueued, to manage these uploads. Should uploads pass a basic test such as successfully passing a GPG key check they are copied to the real archives incoming folder folder.
This has several drawbacks: Besides running an ftp server, nothing in the upload knows what is coming in until the last file (the .changes describing the actual upload) has finally been transmitted. As such the debianqueued must make several assumptions when seeing a (yet) incomplete upload.
This should be changed to an upload server that has full knowledge of debian packages and uploads. Contrary to now, uploads should start by sending the .changes file describing the upload first. The new upload server can then already do basic checks, and in case of errors, immediately tell the user. Especially for maintainers of large packages (think kernel or OpenOffice.org for example) this will help a lot. The new upload server can then also do a set of checks done on binary and source packages at the time it receives them, further helping the "test early, report early" mantra. And results of those checks are stored for later usage in dak, not needlessly duplicating the work in two tools.
If time permits, the archive could be adapted as far as having instant ACCEPTance or REJECTion of a package at upload time, instead of having to wait until the next run of such a process triggered by cron.
This project needs to
- find a sane way of implementing this (using webdav? http post? own server? - though own server should be only the very last option selected. Use existing protocol for the transfer!
- define a proper "protocol" between server and upload clients
- implement it in a well known and supported framework in python and dak.
General information around ftpmaster/dak: http://lists.debian.org/debian-devel-announce/2010/03/msg00003.html
An idea of an upload server which this can base on: http://lists.debian.org/debian-dak/2008/12/msg00052.html
irc log with some text about it
07-04-2010 23:06:04 <laszlo> ok, so as I see the SmartUploadServer will be used only for big packages with a lot of dependencies :> ? 07-04-2010 23:06:14 <Ganneff> no 07-04-2010 23:06:19 <Ganneff> it will be used for everything 07-04-2010 23:06:26 <Ganneff> dqueued will completly die 07-04-2010 23:07:57 <svuorela> it is going to be a part of the debian infrastructure used 100 times each day by all people contributing to debian, right ? 07-04-2010 23:08:12 <Ganneff> yes 07-04-2010 23:08:18 <laszlo> mhm, frankly speaking I not clearly understand the idea ... 07-04-2010 23:08:22 <Ganneff> simple 07-04-2010 23:08:30 <laszlo> dqueued was a kind of ftp syncer ? 07-04-2010 23:08:34 <Ganneff> the current way is that people upload using ftp. into an anonymous account. 07-04-2010 23:08:45 <Ganneff> this is read by an abomination of a perl script 07-04-2010 23:08:53 <Ganneff> processed in obscure ways 07-04-2010 23:08:59 <Ganneff> moved into another directory 07-04-2010 23:09:06 <Ganneff> where there finally dak sees it and goes and checks. 07-04-2010 23:09:36 <Ganneff> so any upload error takes ages before we reject on. even though we could do it right after getting the broken file. 07-04-2010 23:09:50 <Ganneff> also, the perl script is about unmaintainable nowadays. its just too old and to often adapted 07-04-2010 23:10:19 <laszlo> so e.g the big pkg of openoffice 07-04-2010 23:10:37 <laszlo> its divided into lots of parts 07-04-2010 23:11:02 <Ganneff> it doesnt matter if its a big or a small package :) 07-04-2010 23:11:27 <Ganneff> the process now is "upload all by ftp. changes file last" "when dqueued sees changes file, process it (check sig, move all files over if they are there, mail". 07-04-2010 23:12:05 <Ganneff> process in future: "ah, a changes file transferred to it. do checks on that. all fine? allow next file. ah fine, source. check source. all fine? next. ah, a deb. oh bad, lintian errors. stop whole upload here before any more gets in." 07-04-2010 23:13:08 <ArthurLiu> a backward compatible fake-ftp mode could be useful 07-04-2010 23:13:13 <Ganneff> ArthurLiu: but actually, about the clients i dont care much. the project should get one running (dput adaption for example), but otherwise people are free to just use the protocol and do their own if they want 07-04-2010 23:15:52 <laszlo> Ganneff: what are the contents of .changes file now? 07-04-2010 23:16:37 <Ganneff> laszlo: http://ftp.de.debian.org/debian/dists/proposed-updates/ 07-04-2010 23:18:37 <ArthurLiu> FTP with custom messaging is a good solution IMHO 07-04-2010 23:19:06 <ArthurLiu> it's even backwards compatible and is well understood 07-04-2010 23:22:07 <laszlo> and what are the usuall errors concering the users upload nowadays ? 07-04-2010 23:22:18 <Ganneff> ArthurLiu: ftp, http, whatever the student wants. though http is better than ftp. 07-04-2010 23:22:29 <svuorela> laszlo: broken network connections, bad signatures, bad checksums of source package 07-04-2010 23:22:55 <Ganneff> laszlo: broken/unkown sig. various lintian errors. version to old compared to upload. unknown distribution. wrong files/broken checksums, those are just few of the early-detectable things. 07-04-2010 23:23:57 <Ganneff> trying to overwrite existing files (no go). and whatever we can find and think about. (the check part should be open, so easily adjust/extendable) 07-04-2010 23:26:02 <ArthurLiu> Ganneff, that sounds distributable to me 07-04-2010 23:26:18 <Ganneff> ArthurLiu: ? 07-04-2010 23:27:21 <ArthurLiu> the checking part sounds like it can be load balanced between multiple hosts 07-04-2010 23:27:42 <Ganneff> ArthurLiu: and what does this help? 07-04-2010 23:27:47 <Ganneff> (nothing is the answer :) ) 07-04-2010 23:27:58 <ArthurLiu> reliability ? 07-04-2010 23:28:01 <Ganneff> no 07-04-2010 23:28:05 <Ganneff> not at all 07-04-2010 23:28:17 <Ganneff> you might remember that our uploadqueues kept working 07-04-2010 23:29:16 <ArthurLiu> fair enough 07-04-2010 23:34:16 <laszlo> Ganneff: are all the uploaded packages tested through test installation on some server? 07-04-2010 23:34:26 <Ganneff> none at all 07-04-2010 23:34:34 <Ganneff> (not from the queue at least)