Differences between revisions 2 and 3
Revision 2 as of 2008-03-16 10:22:36
Size: 1950
Comment:
Revision 3 as of 2009-03-16 03:29:58
Size: 1947
Editor: anonymous
Comment: converted to 1.6 markup
Deletions are marked like this. Additions are marked like this.
Line 13: Line 13:
  [wiki:DebianBiologyDatabases Biological databases] are typically [wiki:DataPackages too big] and too [wiki:DebianVolatile volatile] to fit the traditional source/binary packaging scheme of Debian. Bioinformatic programs that are distributed in Debian, especially [http://packages.debian.org/blast2 blast] and [http://packages.debian.org/emboss emboss], can index databases, but Debian lacks a tool to install or update datasets and keep their indexing in sync. The student will have to write a program with the following features:   [[DebianBiologyDatabases|Biological databases]] are typically [[DataPackages|too big]] and too [[DebianVolatile|volatile]] to fit the traditional source/binary packaging scheme of Debian. Bioinformatic programs that are distributed in Debian, especially [[http://packages.debian.org/blast2|blast]] and [[http://packages.debian.org/emboss|emboss]], can index databases, but Debian lacks a tool to install or update datasets and keep their indexing in sync. The student will have to write a program with the following features:
Line 17: Line 17:
    * Manage the dependancies between data sets (for instance, microRNAs form release X of [http://microrna.sanger.ac.uk miRbase] are mapped on versions Y and Z of the human and mouse genomes; the program should propose to install the genomes at the relevant version when installing miRbase).     * Manage the dependancies between data sets (for instance, microRNAs form release X of [[http://microrna.sanger.ac.uk|miRbase]] are mapped on versions Y and Z of the human and mouse genomes; the program should propose to install the genomes at the relevant version when installing miRbase).

Biological databases manager

  • Mentor: DebianMed

  • Summary: Download, process, manage and integrate biological databases to Debian.

  • Required skills:

    • Familiarity with one programming or scripting language.
    • Familiarity with Debian packaging.
    • Bioinformatics.
  • Description:

    • Biological databases are typically too big and too volatile to fit the traditional source/binary packaging scheme of Debian. Bioinformatic programs that are distributed in Debian, especially blast and emboss, can index databases, but Debian lacks a tool to install or update datasets and keep their indexing in sync. The student will have to write a program with the following features:

      • Download, update or remove a database from a remote repository.
      • Use a official list of locations, and accept user-supplied additional locations.
      • Detect which Debian packages are installed locally, and process the databases accordingly (mostly by indexing).
      • Manage the dependancies between data sets (for instance, microRNAs form release X of miRbase are mapped on versions Y and Z of the human and mouse genomes; the program should propose to install the genomes at the relevant version when installing miRbase).

      • Let the users have different versions of a database installed.
      • Propose the users to install the Debian packages that are most useful for managing the databases they downloaded.
      • Be aware of disk space issues, and let the user change the location of the databases.
      • (optional) Have a graphical user interface.

Please contact us on debian-med@lists.debian.org for applying.