Collaborative Repository of Meta-Informations
Created: 2006-05-27
Contributors: RaphaelHertzog
Summary
The aim of CRMI is to provide a collaboratively maintained repository of meta informations attached to (source) packages. Meta-informations include for example: various upstream URL (website, VCS, bugtracker, devel mailing list, support mailing list), watch files, URL of VCS for Debian maintenance, statistics about bugs, etc.
Rationale
We could extend the debian/control file to include more information but the experience of debian/watch files shows that you can't rely on maintainers to fill them and to keep them up-to-date. Furthermore it's an unnecessary burden for them. Providing those meta-informations is usually a very simple job that could be done by all the Debian contributors which are not interested in packaging.
Those meta-informations will be useful in many different context, see the use cases.
Use Cases
- Bob, Debian contributor, has a few spare hours and wants to help Debian but his technical skills do not allow him to fix bugs. However he knows that he can improve Debian by extending the CRMI. So he fires up his browser, go to the CRMI web page, identifies himself and is presented a list of the meta-informations that he submitted. He then browses the list of packages without meta-informations and starts to collect and submit them.
- Jim, Debian maintainer, doesn't put watch files in his packages but would like to contribute them to the CRMI anyway because he wants to be informed of new upstream version in his DDPO webpage. He goes to the CRMI web page, identifies himself and is presented a list of his packages. He goes through them and submit a watch file for each package.
Package Tracking System
- The PTS would rely on the CRMI to gather meta-information which are common with the DDPO (like bug statistics, lintian statistics, standards-version, etc.).
- The PTS is often used by QA people to check the "health" of a package. With CRMI the PTS could include links to upstream website and mailing lists, and the QA people who consider removing the package can quickly check if the software is still actively maintained upstream or not.
packages.debian.org
A user goes to packages.debian.org and check the details of a package.
Along with the usual information, he finds a link to the upstream website. There's also a link to a mailing list where he can ask for help. If the HTTP request header has an Accept-Language field, the user will be redirected to a mailing list in his language if it exists, otherwise he's directed to the default user list in english.
- It could also display user-supplied ratings of the packages as well as user-supplied screenshots and popularity information.
- The changelog extraction which is currently done by p.d.o could be delegated to CRMI in order to make it available to other as well.
Adept_installer
- This program provides a view of all the applications available in the distribution and lets the user select them for installation. In order to know the available applications, it needs a copy of all the .desktop files.
- The CRMI automatically extracts those files and make them available as .tar.gz (or maybe even as .deb).
DEHS / New upstream version scanner
- To be effective DEHS needs watch files for all source packages. CRMI is the new reference for watch files.
- The extracted watch files are put in the CRMI.
- Contributors can update the watch files via the web-interface. The system keeps track of the timestamp of the watch files provided by the source package and the user-supplied watch file and use whatever is the latest as reference one.
- The result of the scan of new upstream version is put in the CRMI as well (field "New-upstream-version").
Design
List of meta-information to handle
- watch files
- upstream related infos (URL, bug tracker, VCS, etc)
- screenshot
- multiple .desktop files
- debtags
- user ratings
- popcon
- changelog (history of versions, ...)
- lintian infos
- copyright (license)
- doap description of the project
- sloccount output (number of lines of source code)
- any other relevant information to compare to two packages which are doing the same thing (upstream activity: number of posts/month on the ML, number of CVS/SVN commits, etc.)
Features
- Keep history of changes made by users
- Support (enforce) different policies of review of the changes
- Enable localization of some fields (ex: user support mailing list)
Output formats
one meta-info for all packages (http://.../fields/<field>)
all meta-info for one package (http://.../<pkg>)
one meta-info for one package (http://.../<pkg>/<field>)
- Build some .deb providing some meta-information which are needed by end-user applications (ex: all .desktop files for adept_installer)
- rsync:// access to the tree (mirroring on other machines)
- doap description (RDF format)?
Web interface
- The main page is a form where you can type the source package name. Some AJAX may assist during typing (autocompletion).
- The package page displays a list of all meta-infos of the package and clearly show those that have not yet been filled.
- Near each field there are buttons "edit", "browse history", "delete".
TODO: complete
Mail interface
TODO: complete.
Implementation
Description of meta-info
The list of meta-information that CRMI can handle is described in a global configuration file. For each "field" the configuration file indicates :
- the backend used to store the meta-information
- RFC822(filename): filename is a RFC822 header where the field is stored
- RCS(filename): the field is stored (with history) in a RCS file
- BDB(filename): the field is stored (by timestamp) in a Berkeley DB file
- File(filename): the field is stored in its own file (no history kept)
?FileSet(directory): the field is a set of (binary) files
- if the field is binary / textual (single line) / textual (multiple lines)
- if the field can have multiple values
- if the field can be localized
- if the field can be edited by end-users
- the review policy (not implemented in the first version)
Structure of the storage
The directory is structured following the usual pool structure. All files concerning a given source packages are stored in pool/<first-char>/<src-pkg>/. The content of the directory depends on the various backends used to store the meta-informations.
API
TODO: complete.
Outstanding Issues
Links
Someone wrote some freshmeat parser code: http://ifup.org/git/?p=crmi.git;a=summary Use "git-clone http://ifup.org/git/scm/crmi.git" to check the code. Related to http://abe.osuosl.org/~philips/fm/?p=cups (example link).