Debian Control Files Parsing and Editing Library

Abstract

In 400 characters or less:

This project proposes a common library for parsing and manipulating Debian Control files, including control, copyright and changelog. Main ideas include validating and parsing of these files, with both Strict and Quirks modes for the parser. The second idea is a new frontend for Debconf using Qt4 (for which Perl bindings will be written).


Introduction

At present, there is no common software-based interface to handle Debian Control files. These are special files included in Debian binary packages and they facilitate the functionality of APT and aptitude. They are text files with a specific format defined in the Debian Policy Manual, but are often edited manually by maintainers. This makes it a tedious and error-prone process.

This proposal entails two important efforts: the creation of an interface for handling these files and updating Debconf for KDE's Qt4.

Since Debconf is written in Perl, the current Qt4 bindings will need to be updated. There is a version available on CPAN (?QtCore), but the last version was released in February 2008. Worse, it contains an "unauthorized release" of the Qt top-level module (see the "Qt4 Bindings" section for a detailed discussion). Moreover, it is plagued by test failures: as of April 11 2009, there are 33 FAILures, 9 Not Applicable ratings and 28 Unknown Test Results (see: QtCore Test Results)

Prior Art

Debconf and Perl-Qt4

Since Debconf is written in Perl, then upgrading to Qt4 will mean that Perl bindings need to be available. There is currently some work being done on Perl-Qt4, but there are unresolved problems resulting from threading happening in the qt4 libraries (see: A Perl-Qt4 project page). Part of the last portion of this proposal will involve actively participating in the development of the Perl-Qt4 bindings, but anything done here will have benefits to the Debian community (for Debconf), the larger Perl community (which has had an outdated version of Qt bindings on CPAN since Feb 2008) as well as KDE & Qt.

On the other hand, another idea is to rewrite Debconf in C/C++ and then use Smoke. This could be a rather complex task, but it all depends on the progress the main Perl-Qt4 project will have made by that time.

QtCore-4.004 on CPAN

The most recent available version of Qt4 seems to be unmaintained. The last release was on 04 Feb 2008, and it contains an unauthorized release of the Qt top-level module. As well, there is an outstanding bug that has not been fixed for over a year according to the Request Tracker: RT#32754. The CPAN Testers report summary is: FAIL (33) NA (9) UNKNOWN (28). Any one of these alone would be unacceptable for a production module, especially when it is needed for a universal operating system such as Debian.

Judging by the comments box on the project's main page, there are plenty of outstanding issues that need to be fixed as well. The PerlQt4 project on Google Code is working on correcting this (it is an independent reimplementation of the Perl-Qt4 bindings using the Qt3 version as a base), but there have been outstanding problems with threading under Perl.

Fixing these bindings and making them production-ready is of paramount importance for the Debian, KDE and Perl communities-at-large. It will need to begin by removal of the existing Qt module on CPAN (which was published by Ashley Winters on 16 Apr 1997, and remains at version 0.03), as well as removal of the unauthorized release, ?QtCore-4.004 by Vadim Likhota, to be replaced by an official, community-supported and tested version of the Qt bindings.

One part of this will have to involve the creation of a community account (perhaps a KDE account) on PAUSE to assume the maintainership of this package. Alternatively, I would be willing to act as the maintainer of the Perl Qt bindings, or share that responsibility with Chris Burel, the admin of the Google Code Perl-Qt4 Project, in perpetuity.

Needless to say, keeping these bindings up to date and in good working condition helps to further KDE's mission of providing an easy-to-use graphical environment, both for users as well as developers. The current state of these bindings may have impaired adoption of Qt in the past -- after all, there are equivalent and well-maintained modules providing bindings for other toolkits, such as wxPerl for wxWidgets.

Config::Model

Dominique Dumont, the mentor for this proposal, is the primary author of Config::Model. This module provides a way to store configuration files as object models. The idea here is that manipulation of the object model tree can be done easily and extensibly -- the addition of new clauses to the Debian control files will not require code to be rewritten. Instead, since the code calls happen through the Config::Model, then changes will be virtually transparent to module users.

Project Ideas

  1. Build a programmatic model of debian/control, debian/copyright and debian/changelog files. This model will be able to validate and parse the files.
    • Perhaps separate Strict Mode and Quirks Mode parsers will be necessary
  2. Write a new frontend for Debconf using Qt4
    • Once phase I of this project is complete, then users can be prompted with debian/control data or similar in debconf. For example, if the copyright or licensing terms change, then the user can be shown this. Changes can be detected easily and elegantly with the Config::Model
    • Importantly, this will involve publishing Perl bindings for Qt4

Proposal

Control File Manipulation

Debconf + Qt4

Other Applications

Deliverables

The timelines leave a little bit of flexibility, in part because there are complex dependencies involved in each, so timing is difficult to accurately predict. The best estimates are given above.

Note that I have not programmed with Python before, so it may take more time than expected to produce a usable Python API. I do have previous experience with Perl and XS, so providing a Perl API as well as C/C++ API should be trivial.

Why write a control file parser in C?

I'm primarily a Perl author, and Perl has historically been a language often used for text processing (like configuration file parsing) due to its regular expression functionality. On the other hand, parsing complex files with regular expressions is a really bad software engineering practice, particularly in order to get strict adherence to the Debian Policy Manual which can be specific in some areas. Regular expression matching would be more difficult to get error checking from. Further, I'm not sure of PCRE's ability to deal with UTF-8.

Moreover, since this library needs to be usable from C and Python (or other languages...), then using Perl would be complicated. Most serious languages are built in C and thus provide a simple foreign function interface particularly for C. This means that C++, Python and Perl programs can make easily make use of the library, if it is written in C.

Which non-perl tools are likely to use control file parser? I don't know. Portability from coding in C was part of the initial proposal; it was requested by various individuals. There should be no arbitrary restrictions requiring the use of Perl versus any other language, and I'm not sure how to inline Perl code inside, say, Python.

As a result, I've chosen to write the library in C. It has been suggested that the parser should use yacc/flex or similar to parse the files. Another idea to explore is a tokenizing parser, which will go through code and create a tree of C structs corresponding to the code tree.

The same library should be able to both read and write the control files. That's the intent of this package versus existing implementations using regular expressions, such as the Dpkg series.

Where Config::Model Comes In

So, where does Config::Model come in? Once configuration files have been parsed, then Config::Model provides a way of representing and manipulating the data conveniently. So with the C code parsing the files and returning the information, Config::Model can then take this information and construct an in-memory model of the file. Then it can be manipulated at-will, and written back to disk. This makes it useful for the "dh-make-perl --refresh" command, which updates Debian Perl packages.

Outstanding Issues

But this leaves an unresolved problem. Only Perl programs will be able to /manipulate/ files, since the model is written in Perl. However, by and large, the important requirement is that programs can /read/ the control files in a reliable manner.

Since dh-make-perl is itself a Perl module, this won't be a problem. But not everyone wants to use Perl for their build process, so a future goal will need to be exposing this model to those languages.

A heavyweight but straightforward and quick solution is to embed a Perl interpreter in those applications (ie, C + libperl can thus compile and run Perl commands. Many projects do this, including mod_perl and irssi). This isn't very good for programs like aptitude, but aptitude will generally only require read access, in which case this proposal fits the bill.

Another solution would be to use Config::Model STDIN/STDOUT interface to send configuration command and get results (i.e. fork the command config-model-edit -model DebControl -ui simple and communicate through forked process's STDIN and STDOUT). This command could be used with an expect like module. This interface is low-tech but has the advantage of being language agnostic.

Supported Languages

The library will support calls from the following languages:

Others? Please add them. Once the C code is built, then creating bindings for other languages should be trivial, as many of them (I know Perl and Python in particular) support foreign function interfaces with C.

Likely Mentors

Dominique Dumont has offered to mentor the project, under the auspices of a Debian Developer who would conduct DD-only tasks, since Dominique is not a Debian Developer. Additionally, members of the pkg-perl team have always been helpful for answering questions so I think I can count on them to provide a bit of co-mentorship if necessary.

License

I don't mind releasing my code under any license. Some of my other distributions have been released as Public Domain. I will release my code under whatever is preferred by the committee in question, be it the BSD License, the CC0 License, or the Artistic/GPL combination. I'm totally flexible here.

Bio

My name is Jonathan Yu. I am a 3rd year Electrical Engineering and Computer Science student, completing these two undergraduate programs concurrently. I'm not sure I'm the best person to work on such an undertaking, but I definitely have the passion to do so. I spend most of my free time contributing to open source projects and publishing modules on CPAN (as FREQUENCY). I have spent a few years coding Perl, doing Java/C/C++ for school... Basically I think I will be able write Perl 5 code to do it, in a maintainable, extensible way. I've also got experience programming Perl XS code, so I can optimize anything if necessary to make it run really, really fast :-), which should be helpful since there's such a huge package load.

I develop on Debian Linux with a variety of tools... I'm not sure what to list here, so I won't be too specific. I keep up on industry developments by reading reddit's programming board and blogs. The projects I have done have been smaller, but I see this proposal as a series of small projects rather than one large monolithic undertaking. As a result, I am confident I can deliver on all of my promises before the summer concludes.

I'm really just learning, and am very grateful for the community's help in getting me where I am today. I consider myself a decent coder, and while I'm learning new things all the time, I feel I know enough to make a significant positive contribution to the community. The problem I face is that I do not have much free time on my hands, since I spend most of the year in school (8 months) and then the summer working a full-time job. If I am granted this opportunity, then I will be able to focus one hundred percent of my time and energy to these projects, which I'd really like to see in the near future.

Eligibility

From reading the guidelines I believe I'm an eligible student and meet Google's legal requirements. I have all of the necessary paperwork for Google's proof, and am willing to produce copies of my transcript or whatever else are necessary. I don't anticipate there being any issues if and when my project is selected.

Debian Developer

I'm not a Debian developer. However, I am a pkg-perl member, and a Debian user. It's something I enjoy spending my time doing, and would just like some more time to do.

Related

References

RFC: Better formatting for long descriptions (lists.debian.org)