Debian development and release procedures: always releasable testing
The wheezy freeze has been much too long. At ten months, it's four months longer than what we've gotten used to in several previous releases. Had we managed to keep the freeze at six months, it would still have been too long. I believe there is something wrong in how we develop Debian, and how we do releases, and that by fixing them, we can have much shorter releases, with an increase in their quality.
Freezes are long in part because we need to do so much work during them. Most importantly, we need to fix so many release critical bugs (RC bugs), that a short freeze is not possible, without drastically lowering the quality of Debian.
A long freeze is highly frustrating to everyone. It's a very stressful period for the release team, obviously, but since the freeze affects all development, even those of our developers who do not care about the release feel its effects in their development. Our users would like fresh upstream versions, but that rarely happens in unstable, and because the freeze is so long, when the release actually happens, much software seems a bit stale. Upstreams, who would like to get their software into the hands of users as soon as possible, including via Debian, are also frustrated.
We should aim for a short freeze, perhaps as short as two weeks, and certainly not longer than two months. This would remove the frustration, and fix the other problems related to a long freeze. However, to achieve a short freeze, we need to change how develop Debian.
The fundamental change is to start keeping our "testing" branch as close to releasable as possible, at all times. For individual projects, this corresponds to keeping the master or trunk branch in version control ready to be released. Practitioners of agile development models, for example, do this quite successfully, by applying continuous integration, automatic testing, and by having a development culture that if there's a severe bug in master, fixing that gets highest priority.
We can do similar things in Debian, and if we do, I believe that we can keep testing in a releaseable state almost all of the development cycle between two releases. The minimum necessary changes to achieve this, in my opinion, are:
- An attitude change: we decide that releases are important, and that they're the job of the entire project, not just the release team.
- Keep testing free of RC bugs.
- We should use automatic testing much more extensively, to find problems as early as possible.
- We should limit the number of packages we strongly care about for a release.
Releases are important
Releases are important to many, perhaps most, of our users. Hackers and hardcore powerusers don't necessarily care about them, of course, but most others do. A released version of Debian implies that the operating system works: there's a working installer, for example. It also implies that all the packages are expected to work together: there's no transitions going on, for example, that might break dependencies or reverse dependencies.
A release is important to many users because it means that if they have to re-install, they will get back the same system they used to have. Or they can install another computer that will behave the same way as the first one. This reproducibility is also why enterprises like them: they can confidently assume that if they install fifty thousand machines, they'll all be the same. Without this kind of uniformity, system administration costs, and end-user support costs, become unmanageable.
But releases are also important for us, as a project. They're an excellent point to stop and say, "we have achieved this, and it is good". It's a reason for others to have a look at Debian and see that it is good. This generates a good feeling, which gives us more motivation to work on Debian.
It's true that we can't expect every Debian developer to care about making a release. That's OK. We just need the minority who don't care to not get in the way of the release.
Keep testing free of RC bugs
The RC bug count for the testing branch should be kept low all the time. Right after a release, by definition, testing is free of RC bugs. With the current development model, right after the release we open the floodgates, and large number of new packages and versions enter testing. The bug count sky-rockets, but we don't care a lot about that until the next freeze gets closer. This means testing is not anywhere near in a releasable condition during most of the development cycle.
We should, instead, make sure testing is kept free of RC bugs as much as possible. There are a variety of things we can do about it:
Remove RC buggy packages sooner rather than later. An RC buggy package should be removed at soon as possible: when the bug is identified, allow a bit of time for the bug to be verified (was it actually an RC bug?), but after that, remove the package from testing, preferably automatically. If the package has reverse dependencies, remove those as well. This keeps testing releasable. The removed package can and will re-enter testing once it gets fixed.
To reduce the sting of optional packages missing the release, we should consider whether we're willing to introduce new packages in stable point releases. Obviously, only packages that have no new dependencies could be introduced that way, so things that require newer versions of the packages already in stable would not be eligible. But it means that if a package was in the previous stable but missed the current stable due to unresolved issues at the time of the releease, we could still get it back in and it wouldn't have to wait another year or two.
We would need some staging area to ensure that the stable build of the package was actually tested. Backports could be used for that purpose.
When a package is too important to be removed from testing (e.g., gcc or bash), if it gets an RC bug, all developers should be encouraged to help fix it. This can be done in various ways, from the fun (a BSP aimed at that one bug only, perhaps) to the dictatorial (prevent all uploads to unstable unless they fix an RC bug in testing).
When the RC bug count in testing grows above a particular threshold, have a bug-fix-only mini-freeze: stop the migration of packages to testing, except for packages that fix RC bugs. Ideally, we would automate as much of this as possible rather than making the release team do it manually. When the RC bug count drops back below the threshold, we re-open testing. This provides a constant feedback cycle where, if we're not managing RC bugs properly, testing stays frozen more and more and provides more pressure to manage RC bugs properly.
Not having RC bugs in testing is a necessary (though not sufficient) condition for releasing. We have to keep the count as close to zero all the time in order to keep the freeze short.
Debian is now much too big to give the same importance for every package, as far as the release is concerned. In reality, we don't: the release team has a much lower threshold for removing nethack than it has for bash. We can release without nethack, but not without the default shell.
We should codify this, and make it what counts as necessary package to be included in the release, and what does not. I propose a set of "reference installations" of Debian, for various purposes. We have the related concept of "task" already, in the installer:
- ssh server
- mail server
- LAMP server
- desktop system
- print server
We should have an explicit list of such reference installations and declare them as crucial for the release: if they work, we can release, and if they don't, we can't. Each reference installation should have a clearly defined purpose, and therefore a clearly defined list of packages that must be included.
A package that is not included in one or more of the reference installations is a package we want to include in the release, but we will not delay the release for its sake. We should have a low threshold for removing such a package from testing: it could perhaps even be removed automatically one week after an RC bug is filed against it (assuming the bug affects the version in testing).
This creates two classes of citizenship for packages. This is unavoidable, and is actually already the case. It is not a criticism of the packages, or their maintainers, if they're not included in a reference installation. Nethack just isn't as important at bash.
The only difference between packages included in reference installations and those not included is that packages in reference installations have a higher threshold to be removed from testing. (If a reference installation does not meet quality criteria, the release team has the option of dropping it.)
The set of reference installations requires careful thought and broad consensus. They are the packages we, as a project, especially wish to support. Each reference installation should also be possible to verified for quality: there should be an automatic test suite of sufficient coverage and quality that it makes sense to let it be crucial for the release.
Use automatic testing extensively
We have some automatic testing tools specifically for Debian: lintian, piuparts, adequate, autopkgtest, and probably more. We should use these much more extensively, and let them guide the migration of packages into testing.
Automatic testing will catch some classes of bugs much faster, and perhaps more reliably, than relying on bug reports. We need both. The job of automatic testing is not to prove the absence of bugs, but to establish a trusted lower limit for quality: it shows us that certain things work and will notify us if we ever break them. This gives us, the developers, more confidence that the changes we make are not too destructive, and notifies us if they are. Most importantly, automatic testing will find bugs faster, which then makes it easier to fix them, and reduces their impact.
Imagine a continuous integration system for Debian: for every new package upload to unstable, it builds and tests all the reference installations. If all builds succeed, and all tests pass, the package can be moved into testing at once. When you, a developer, upload a new package, you get notified about test results, and testing migration, within minutes.
The number of packages in Debian, and the amount of churn in unstable, makes this not quite possible to achieve without massive amounts of hardware. However, we can get close: instead of testing each package separately, we can test together all the packages that have been uploaded to unstable since the previous run, and mostly this will be a fairly small number of packages.
Ideally we will run the tests for each release architecture, but it may be enough to run them on amd64 only. We'll need to experiment with this.
Automatic tests do not need to have very much coverage in order to be quite useful. Even very simplistic tests, like the ones piuparts does, find quite a lot of problems. If we create a framework to run the tests which makes it easy to add more tests, we will in time accumulate a large test battery. Look at lintian: it has a staggering number of tests now, but they've been written over a period of more than a decade. Ideally, we can benefit from such tests that have already been written for other distributions, and share ours with them.
Tests for running reference installation might include the following:
- Basic networking setup works: System responds to ping from the outside.
- Mail server responds appropriate on the SMTP, submission, IMAPS, and POPS ports.
- LAMP server responds on the HTTP and HTTPS ports.
- A desktop system that automatically logs in a test user has the right processes running, and can start some common applications.
- In each case, it's possible to log in remotely with ssh and run "sudo apt-get install hello".
These are trivial, even simplistic tests. However, if they pass, we know that at least the basic, fundamental things in the system are not horribly broken: networking, system administration, and the software that is meant to start in that reference installation. Furthermore, we know that the debian-installer works. That's a good foundation for further hacking.
Holger Levsen is already doing at least some of this on http://jenkins.debian.net/ and he's happy to get help to improve that service further.
We believe, based on our experience as software developers, that adopting these suggestions will make the jessie release cycle and release process smoother, and increase the quality of the end result.
- Lars Wirzenius
- Russ Allbery
1. Is there any one listening here? Add your comment here, if you want to comment on the wiki.
The use of the adjective "automatic" in this article is somewhat confusing. It may be used in the context of test automation to describe the automatic generation of test cases or the automatic running of test cases, but I believe what was really intended by this article is "automated tests" (see ?Wikipedia:Test automation). -- Philip