Differences between revisions 27 and 28
Revision 27 as of 2009-03-16 03:32:46
Size: 3507
Editor: anonymous
Comment: converted to 1.6 markup
Revision 28 as of 2009-04-03 05:14:16
Size: 3661
Comment: Association between PMIDs, PMCIDs, and DOIs.
Deletions are marked like this. Additions are marked like this.
Line 22: Line 22:
||ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/PMC-ids.csv.gz || A big table associating PMIDs, PMCIDs and DOIs for all the articles in Pubmed Central.|| || || ||

Introduction

Software changes repeatedly and the package maintainers do the best to keep pace with upstream's progress. It seems inappropriate though to prepare regular Debian packages for large database since

  • data is released frequently
    • some user demand weekly updates
    • others refer to official releases
  • some databases are large, e.g. UniProt/Pfam all are beyond the Gigabyte barrier
  • updating databases will demand further operations
    • update of indices
    • ..? which depends on other tools and packages that are installed on the machine

A tool is needed to help automating the update of packages. A first rudimentary skeleton was prepared with getData.pl on the Debian-Med subversion repository.

Public Databases that may be considered for Debian

Name

Contents

Licence

Package

Treated by getData.pl

Genbank

Public sequences

publicly available

BAliBASE3

Sequence alignments version 3, and a C program for scoring

unknown, but contains a header file from ClustalW, which is not free

OXbench

Multiple alignments and scoring system

www.pseudogene.org

pseudogenes

unknown

Jaspar

Transcription factor binding sites

"Freely available"

ORegAnno

Regulatory sequences

LGPL

Pazar

public repository for regulatory data

Says Open Source but not found

ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/PMC-ids.csv.gz

A big table associating PMIDs, PMCIDs and DOIs for all the articles in Pubmed Central.

REbase

Restriction Enzymes

?

fink

Repbase

Repeat elements

academic, registration needed

Zebrafish repeats

Repeat elements (Zebrafish)

no license

Probably, many free databases can be found in the database issue of Nucleic Acid Research http://nar.oxfordjournals.org/content/vol34/suppl_1/index.dtl

We also need open-source software to warehouse the databases

Name

Licence

Package

Listed on microbio.wml?

BioMart

LGPL

packaged (unofficial)

yes

S3DB

GPL

depends on PHP and (My|Postgre)SQL

no

BioMOBY

Artistic

depends on java

no

hitkeeper

GPL

depends on Perl and SQL

no

BioWarehouse

MPL 1.1

no

mrs

4-clause BSD

complex

no

BioSQL

LGPLv3

no

Here is an interesting link about tools for biological data: http://biodatamodel.org/