Differences between revisions 11 and 12
Revision 11 as of 2011-04-05 11:46:13
Size: 3623
Editor: PaulWise
Comment: use alias instead of server
Revision 12 as of 2019-12-17 04:26:29
Size: 518
Editor: nyov
Comment: Semi-deleting page, replace it with a link to UDD.
Deletions are marked like this. Additions are marked like this.
Line 3: Line 3:
Mole is a QA work-in-progress project. Also look at [[CRMI]].

The goal of Mole is to have one central location where information about packages and other Debian-related objects (such as bugs, or mirror) can be stored.

Mole is currently being worked on by [[Jeroen]] van Wolffelaar as part of his [[http://code.google.com/soc/debian/appinfo.html?csaid=31AA1D661D273528|Google Summer of Code project]].

See [[Mole/Development]] for a wikipage listing current development status.

== What is Mole? ==

Mole is intended to be an easily accessible piece of infrastructure where
anyone can add data repositories, can have actual data submitted in various
easy ways into readily available data storage types. All this data is then
easily and efficiently available, both in programmatic microqueries or via a
webinterface, and as whole datasets, including replication. In addition to
this, Mole also provides infrastructure for initiating datamining: generating
data by having specific code run over each result from another table, for
example.

== Advantages ==

  1. it will be very easy for random ideas to do archive-wide checks, or datamining on all bugs, etc etc, to be implemented by any DD without the need to program the 'boring' infrastructure around it -- one only needs to program the interesting bits
  1. Results of existing QA- and other datamining and archive checks are made easily available for anyone, for humans via the mole webinterface, but also for further automatic processing, via a couple of standard interfaces. This includes lintian results, results of various rebuild efforts, piuparts, but also bug summaries, extraction of changelog files, dependency checks, etc
  1. Powerful new possibilities arise to combine existing information in new ways without the need to coerce information into compatible formats
  1. Existing and future data gathering can easily be made to also process secondary archives, such as security.debian.org, volatile and backports, without the need to specifically target those archives
Mole has been a QA work-in-progress project in 2007 by [[Jeroen]] van Wolffelaar <jeroen@debian.org> as part of his Google Summer of Code project.<<BR>>
Source code and services are no longer online.
Line 30: Line 7:
== Sorts of information available == It's idea has been supplanted by UDD, the [[UltimateDebianDatabase]].
Line 32: Line 9:
There are several classes of information:

 * Extracted information directly from the packages
 * Generated information, for example: running lintian over a package, rebuilding a package
 * User-supplied information (screenshots, descriptions)
 * And more

== Storage formats ==

Things are multiple storage types possible, at the moment two are defined, both for 'fixed' types of information (doesn't change over time), such as "the control file out of a source package", and unlike for example "rebuilding the package"

 * Bdb: a berkley DB, atomically moved over the public one after a set of updates, so that reading-without-locking is possible
 * HashfileBDb: a berkley DB with sha1-hashes, and the actual data in gzipped files, named after the hashes: space efficiency due to gzip and storing the same data only once. For example, changelogs (which are often the same across builds on all architectures etc).

== Examples ==

 * All .desktop files from all .debs in unstable & testing are available
 * Lintian results on all source & binary packages
 * md5sums of all files in all .debs

See for raw data: http://qa.debian.org/data/mole/db

Or for a very very slim web interface: http://qa.debian.org/cgi-bin/mole
Talk in Edinburgh, on Friday June 22nd 2007:
http://meetings-archive.debian.net/pub/debian-meetings/2007/debconf7/low/371_Mole_Infrastructure_for_managing_information.ogg
Line 57: Line 13:
== More information ==

The code is available for Debian Developers at qa.debian.org:/org/qa.debian.org/mole. It's also in subversion: svn.debian.org, repository "qa", subdir "mole".

The primary author is [[Jeroen]] van Wolffelaar <jeroen@debian.org>
(Also look at [[CRMI]].)
----
CategoryDatabase

Mole

Mole has been a QA work-in-progress project in 2007 by Jeroen van Wolffelaar <jeroen@debian.org> as part of his Google Summer of Code project.
Source code and services are no longer online.

It's idea has been supplanted by UDD, the UltimateDebianDatabase.

Talk in Edinburgh, on Friday June 22nd 2007: http://meetings-archive.debian.net/pub/debian-meetings/2007/debconf7/low/371_Mole_Infrastructure_for_managing_information.ogg

(Also look at CRMI.)


CategoryDatabase