Package Repository Analysis and Migration Automation

Mentor

Neil Williams codehelp@debian.org (codehelp on IRC, typically #emdebian, #debian-arm, #debian-uk and #debian-soc)

Synopsis

Emdebian uses a filter to select packages from the main Debian repositories that are considered useful to embedded devices, excluding the majority of packages. The results of processing the filter are automated but maintaining the filter list is manual. This project seeks to automate certain elements of the filtering process to cope with three specific conditions:

  1. Packages which have been removed from Debian need to be removed from the filter - on a per suite basis.
  2. Packages which have been added to Debian to meet the dependency requirements of other packages already in the filter need to be added to the filter.
  3. Packages need to migrate from unstable into testing in a manner that ensures that all dependencies are met in each suite.

Benefits to Debian

The aim is to produce three daily lists:

  1. source package names which need to be removed from each suite,
  2. source package names which need to be added to each suite and
  3. source package names to migrate between suites.

This will allow Emdebian to quickly and effectively manage it's own package repository. Right now too much manual effort goes into maintaining these lists. By automating the migration of packages, it will save hours of work better spent on improving other features of Debian.

Deliverables

A Debian package to run on the Emdebian server as a cron task.

Project schedule

May 24 - Begin designing application structure and necessary algorithms

June 7 - Begin coding data package parser portion of the application

June 14 - Rigorous testing and documentation of first portion of the application

June 16 - Submit midterm evaluations

June 21 - Begin coding package comparision portion of the project

June 28 - Rigorous testing and documentation of second portion of the application

July 5 - Begin coding migration validation portion of the project

July 12 - Rigorous testing and documentation of third portion of the application

July 19 - Begin coding dependancy satisfaction portion of the project

July 26 - Rigorous testing and documentation of the final portion of the application

August 2 - Ensure validity of output produced and make necessary corrections

August 9 - Finish writing application and spend time improving documentation and writing any necessary user guides

August 20 - Submit final evaluation

Project details

A lot of work has been done on modeling the dependencies between packages by the EDOS project using OCAML. A similar approach is needed to calculate the list of candidate packages which can be migrated at the same time.

Undoubtedly the most complex part of the project is to calculate the testing migrations where several criteria must be met:

  1. Version in Debian unstable must match version in Debian testing
  2. Version in Debian unstable must match version in Emdebian unstable
  3. Version in Emdebian unstable must be newer than Emdebian testing
  4. All architectures are compared, including source.
  5. All dependencies must migrate together, adding new packages to the filter where necessary.

(Emdebian versions use a suffix which needs to be handled before comparing version strings against Debian.)

The resulting code needs to run on a server as an automated task, using minimal resources and in a shorter time frame than be achieved with the current perl support.

So, the process by which packages are determined to be accepted is as follows:

  1. Emdebian uses Debian as 'parent'. The first criterion is that the package has already satisfied the criteria for Debian and this is a simple data parsing operation from the Packages files. A package in file A is the same package as in file B, therefore that package has (at some point) previously met the criteria for Debian and is a candidate for us. The number of packages which would fail this initial test will vary according to where emdebian is in the release cycle - after a release, this number is very high, during a freeze it can be very low (single digits).
  2. A simple check on our own Packages files (files C and D) to see if we need to bother with this package or whether the work has already been done by a previous run. A lot more packages will drop out at this test. Debian has >20,000 packages, Emdebian only cares about ~2,000 so 90% of the work has gone by the time this criterion is completed.

  3. Whether we have the right package for a migration. The following check is performed.
    • Package in Debian unstable == Emdebian unstable
      • and
    • Package in Debian testing != Emdebian testing.
    If this fails, there isn't a candidate for migration and this drops through to "failed" in the output so that the problem can be investigated. (Maybe Emdebian unstable hasn't updated yet.)
  4. Now that the packages have been filtered, the dependency solving maths needs to be done. The dependencies of the package in unstable need to exist at the correct versions in testing. If this test fails, the package drops through to the "missing dependency" output - once missing packages are sorted out, the next run will be able to migrate the package.

Application Information