Mole development page
Each section here lists a development task. Tasks are roughly in order of planning-to-be-worked-on, and in itself, should typically be of at-most-one-day-work size.
See Mole for a general description of Mole.
Near future
Define terminology
Need to come up with a nonconfusing terminology; particularly, for datasets (dataset, database, table, ?), the collection of such tables which only vary in parameters (f.e., all packagelists as opposed to packagelist for binaries in unstable)
Status: terminology in use is currently inconsistent, idea's exist
Rest of Google SOC plan
Implement stacking
Most notably, tables that build upon others should get some better way than bluntly redoing their work periodically: the fact that input data changed needs to be propagated some way.
Status: idea's are being pondered, nothing concrete yet coded
Implement the web interface
Status: Dumping webinterface exists, real work starting monday august 6th
Implement HTTP-submission with authentication
Implement HTTP-authentication on the work distributor
Integrate whatever other existing datasets are available in QA and elsewhere and are not yet in mole, as far as time permits
Write a user manual
Finalize the dev overview (generally made while programming)
Package mole and make it host-agnostic, FHS-compliant
Consider any-DD creation of databases without QA-group interference
Consider an email submission interface
More idea's
Add config checker
The config checker should also alert for unknown/unused configuration stanza's, as they might indicate typo's
Status: not started
Add logging configurability
For each dataset, it should be possible to define logging behaviour: log detail, and whether and for how long incoming update files are to be retained.
Define configuration format
Need to come up with a good definition for the configuration file. It'll be dak-like (using apt's config file parser).
Status: definition-by-example mostly done, still need proper definition and documenation
Implement configuration parser
The configuration needs to be parsed into a datastructure that the rest of mole can easily use.
Status: Current system works, would still be nice to have so that configuration is more reliable and replacable, but not at all a priority
Do something about not-yet-created tables
When tables do not yet exist, you get backtraces when trying to read them (for example, in todo code, or various other places). Best would be to have some clean way to simply get an empty then then, and/or make sure that tables are always created if mentioned in the config.
Status: with a hack in todo.py, not an urgent issue anymore
Cleanup older cruft
At several places, cruft can accumulate. Ways to clean this up need to come there:
- Transient data history: configure cutoff date and/or cutoff amount of versions
- bdb maintainance: all databases should be periodically reloaded, if for nothing else than to recover from spikes in size
- files in hashfiledb's need to be cleaned up when no longer referenced
- databases which are no longer configured need to be tagged for removal (probably best not automatically, though), ditto tables in there
- Items in tables that are no longer wanted might sometimes be cleaned up after some time
Done
But kept for historic reasons, for now.
Cleanup code
The current version in subversion contains a number of hacks and shortcuts, those need to be cleared away/generalized.
Status: mostly done not worth a mention anymore
Make path configuration fully configurage
Ensure you do not need to edit the code if you install mole in a different location. Mole should start out by reading $HOME/.mole.conf (which can of course be a symlink), and all future paths are defined in that config file or files referenced by the config file.
Status: Done, for mole itself (exceptions are logrotate.conf and worker-config)
Move package-specific code into mole jobs
Mole itself doesn't need to know about package files and archives, that work can (and should) be put into a seperate mole job.
Status: done
Implement transitional datatypes
Some (many) datatypes are 'transitional', that is, the value for a given key can change over time, and it makes sense to keep a history of those changes.
Status: done, though query interface not complete yet
Implement mass-submission
There needs to be a way to submit many key/values in one go, to reduce load.
Status: done
Deliver talk in Edinburgh
I'm going to give a talk in Edinburgh about mole, on Friday June 22nd.
Status: done, http://meetings-archive.debian.net/pub/debian-meetings/2007/debconf7/low/371_Mole_Infrastructure_for_managing_information.ogg
Implement everything else needed to at least get a database for lintian results
Status: Well, this works again.
