Differences between revisions 21 and 22
Revision 21 as of 2008-01-21 12:08:18
Size: 3056
Comment: introduction added
Revision 22 as of 2008-02-25 10:47:28
Size: 3035
Comment: updated link to getData
Deletions are marked like this. Additions are marked like this.
Line 13: Line 13:
A tool is needed to help automating the update of packages. A first rudimentary skeleton was prepared with [http://svn.debian.org/wsvn/debian-med/trunk/community/infrastructure/getData.pl?op=file&rev=0&sc=0 getData.pl] on the Debian-Med subversion repository. A tool is needed to help automating the update of packages. A first rudimentary skeleton was prepared with [http://svn.debian.org/wsvn/debian-med/trunk/community/infrastructure/getData/ getData.pl] on the Debian-Med subversion repository.

Introduction

Software changes repeatedly and the package maintainers do the best to keep pace with upstream's progress. It seems inappropriate though to prepare regular Debian packages for large database since

  • data is released frequently
    • some user demand weekly updates
    • others refer to official releases
  • some databases are large, e.g. UniProt/Pfam all are beyond the Gigabyte barrier
  • updating databases will demand further operations
    • update of indices
    • ..? which depends on other tools and packages that are installed on the machine

A tool is needed to help automating the update of packages. A first rudimentary skeleton was prepared with [http://svn.debian.org/wsvn/debian-med/trunk/community/infrastructure/getData/ getData.pl] on the Debian-Med subversion repository.

Public Databases that may be considered for Debian

Name

Contents

Licence

Package

[http://www.ncbi.nlm.nih.gov/Genbank/GenbankOverview.html Genbank]

Public sequences

Publicly available

[http://bips.u-strasbg.fr/fr/Products/Databases/BAliBASE3/ BAliBASE3]

Sequence aligments version3, and a c program for scoring

unknown, but contains a header file from clustalw, which is not free

[http://www.compbio.dundee.ac.uk/Software/Oxbench/oxbench.html OXbench]

Multiple alignments and scoring system

[http://www.pseudogene.org www.pseudogene.org]

pseudogenes

unknown

[http://jaspar.genereg.net/ Jaspar]

Transcription factor bindng sites

"Freely available"

[http://www.bcgsc.ca:8080/oregano/Index.jsp ORegAnno]

Regulatory sequences

LGPL

[http://www.pazar.info Pazar]

public repository for regulatory data

Says Open Source but not found

[http://rebase.neb.com REbase]

Restriction Enzymes

?

[http://pdb.finkproject.org/pdb/package.php/embossdb-rebase fink]

[http://www.girinst.org/repbase/index.html Repbase]

Repeat elements

Academic, registration needed

[http://www.genetics.wustl.edu/fish_lab/repeats Zebrafish repeats]

Repeat elements (Zebrafish)

no license

Probably, many free databases can be found in the database issue of Nucleic Acid Research http://nar.oxfordjournals.org/content/vol34/suppl_1/index.dtl

We also need open-source software to warehouse the databases

Name

Licence

Package

Listed on microbio.wml?

[http://www.biomart.org/ ?BioMart]

LGPL

packaged ([http://bioinformatics.pzr.uni-rostock.de/~moeller/debian/martj/ unofficial])

yes

[https://sourceforge.net/project/showfiles.php?group_id=147980 S3DB]

GPL

depends on PHP and (My|Postgre)SQL

no

[http://biomoby.open-bio.org/ BioMOBY]

Artistic

depends on java

no

[http://hitkeeper.sourceforge.net/ hitkeeper]

GPL

depends on Perl and SQL

no

[http://bioinformatics.ai.sri.com/biowarehouse/ ?BioWarehouse]

MPL 1.1

no

[http://developer.berlios.de/projects/mrs/ mrs]

4-clause BSD

complex

no


  • Back to ?DebianScienceBiology