Differences between revisions 1 and 2
Revision 1 as of 2008-02-29 10:55:54
Size: 1830
Comment:
Revision 2 as of 2008-02-29 10:57:15
Size: 1858
Comment:
Deletions are marked like this. Additions are marked like this.
Line 15: Line 15:
 * define a database schema that works.  * define a database schema that works. Implement it in a pgsql db.

Ultimate Debian database

  • Mentor: Lucas Nussbaum <lucas@debian.org> + preferably someone else, to share the load

  • Summary: Import all interesting data about Debian in a database and data-mine it

  • Required skills:

    • Relational model, SQL Databases, both theorical and practical knowledge. You will probably have to deal with complex queries, optimization of tables, etc.
    • Knowledge of a scripting language (Python, Ruby, Perl, ...)

There's a lot of data in Debian, in many different places: Sources and Packages files, bug tracking system, popcon, DEHS, etc, etc, etc, etc. When someone want to combine two different kinds of data to look for discrepancies, or simply to present data in a different, more useful way, he usually has to write scripts to import this data in an usable form, and scripts to combine that data. Which is *very* painful and error-prone.

The goal of this GSOC project is to move from ad-hoc scripts to a centralized approach, by implementing:

  • an SQL database where all the interesting data will be stored
  • scripts to import data from other sources to this database

What the student should have done at the end of the project:

  • define a database schema that works. Implement it in a pgsql db.
  • write scripts to import data from various sources, into the database (including - but not limited to - Sources and Packages files for all suites and sections, BTS data, popcon data, debtags tags, etc.).
  • write example scripts that present data in useful ways. For example:
    • RC bugs in packages in testing, sorted by popcon
    • Packages that are in unstable, but not in testing, sorted by popcon
    • Packages with the more bugs
    • Maintainers with the more bugs
    • ...
  • make it possible to easily move the DB and the scripts to another system. write documentation.