Mole
Mole is a QA work-in-progress project. Also look at ["CRMI"].
The goal of Mole is to have one central location where information about packages and other Debian-related objects (such as bugs, or mirror) can be stored.
Mole is currently being worked on by ["Jeroen"] van Wolffelaar as part of his [http://code.google.com/soc/debian/appinfo.html?csaid=31AA1D661D273528 Google Summer of Code project].
See ["Mole/Development"] for a wikipage listing current development status.
What is Mole?
Mole is intended to be an easily accessible piece of infrastructure where anyone can add data repositories, can have actual data submitted in various easy ways into readily available data storage types. All this data is then easily and efficiently available, both in programmatic microqueries or via a webinterface, and as whole datasets, including replication. In addition to this, Mole also provides infrastructure for initiating datamining: generating data by having specific code run over each result from another table, for example.
Advantages
- it will be very easy for random ideas to do archive-wide checks, or datamining on all bugs, etc etc, to be implemented by any DD without the need to program the 'boring' infrastructure around it -- one only needs to program the interesting bits
- Results of existing QA- and other datamining and archive checks are made easily available for anyone, for humans via the mole webinterface, but also for further automatic processing, via a couple of standard interfaces. This includes lintian results, results of various rebuild efforts, piuparts, but also bug summaries, extraction of changelog files, dependency checks, etc
- Powerful new possibilities arise to combine existing information in new ways without the need to coerce information into compatible formats
- Existing and future data gathering can easily be made to also process secondary archives, such as security.debian.org, volatile and backports, without the need to specifically target those archives
Sorts of information available
There are several classes of information:
- Extracted information directly from the packages
- Generated information, for example: running lintian over a package, rebuilding a package
- User-supplied information (screenshots, descriptions)
- And more
Storage formats
Things are multiple storage types possible, at the moment two are defined, both for 'fixed' types of information (doesn't change over time), such as "the control file out of a source package", and unlike for example "rebuilding the package"
- Bdb: a berkley DB, atomically moved over the public one after a set of updates, so that reading-without-locking is possible
- HashfileBDb: a berkley DB with sha1-hashes, and the actual data in gzipped files, named after the hashes: space efficiency due to gzip and storing the same data only once. For example, changelogs (which are often the same across builds on all architectures etc).
Examples
All .desktop files from all .debs in unstable & testing are available
Lintian results on all source & binary packages
- md5sums of all files in all .debs
See for raw data: http://qa.debian.org/data/mole/db
Or for a very very slim web interface: http://qa.debian.org/cgi-bin/mole
More information
The code is available for Debian Developers at merkel:/org/qa.debian.org/mole. It's also in subversion: svn.debian.org, repository "qa", subdir "mole".
The primary author is ["Jeroen"] van Wolffelaar <jeroen@debian.org>