= Introduction = Software changes repeatedly and the package maintainers do the best to keep pace with upstream's progress. It seems inappropriate though to prepare regular Debian packages for large database since * data is released frequently * some user demand weekly updates * others refer to official releases * some databases are large, e.g. UniProt/Pfam all are beyond the Gigabyte barrier * updating databases will demand further operations * update of indices * ..? which depends on other tools and packages that are installed on the machine A tool is needed to help automating the update of packages. A first rudimentary skeleton was prepared with [[http://svn.debian.org/wsvn/debian-med/trunk/community/infrastructure/getData/|getData.pl]] on the Debian-Med subversion repository. = Public Databases that may be considered for Debian = ||'''Name ''' ||'''Contents ''' ||'''Licence ''' ||'''Package ''' ||'''Treated by getData.pl'''|| ||[[http://www.ncbi.nlm.nih.gov/Genbank/GenbankOverview.html|Genbank]] ||Public sequences ||publicly available || || || ||[[http://bips.u-strasbg.fr/fr/Products/Databases/BAliBASE3/|BAliBASE3]] ||Sequence alignments version 3, and a C program for scoring ||unknown, but contains a header file from ClustalW, which is not free || || || ||[[http://www.compbio.dundee.ac.uk/Software/Oxbench/oxbench.html|OXbench]] ||Multiple alignments and scoring system || || || || ||[[http://www.pseudogene.org|www.pseudogene.org]] ||pseudogenes ||unknown || || || ||[[http://jaspar.genereg.net/|Jaspar]] ||Transcription factor binding sites ||"Freely available" || || || ||[[http://www.bcgsc.ca:8080/oregano/Index.jsp|ORegAnno]] ||Regulatory sequences ||LGPL || || || ||[[http://www.pazar.info|Pazar]] ||public repository for regulatory data ||Says Open Source but not found || || || ||ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/PMC-ids.csv.gz || A big table associating PMIDs, PMCIDs and DOIs for all the articles in Pubmed Central.|| || || || ||[[http://rebase.neb.com|REbase]] ||Restriction Enzymes ||? ||[[http://pdb.finkproject.org/pdb/package.php/embossdb-rebase|fink]] || || ||[[http://www.girinst.org/repbase/index.html|Repbase]] ||Repeat elements ||academic, registration needed || || || ||[[http://www.genetics.wustl.edu/fish_lab/repeats|Zebrafish repeats]] ||Repeat elements (Zebrafish) ||no license || || || Probably, many free databases can be found in the database issue of Nucleic Acid Research http://nar.oxfordjournals.org/content/vol34/suppl_1/index.dtl We also need open-source software to warehouse the databases ||'''Name ''' ||'''Licence ''' ||'''Package ''' || ||[[https://sourceforge.net/project/showfiles.php?group_id=147980|S3DB]] ||GPL ||depends on PHP and (My|Postgre)SQL || ||[[http://biomoby.open-bio.org/|BioMOBY]] ||Artistic ||depends on java || ||[[http://hitkeeper.sourceforge.net/|hitkeeper]] ||GPL ||depends on Perl and SQL || ||[[http://bioinformatics.ai.sri.com/biowarehouse/|BioWarehouse]] ||MPL 1.1 || || ||[[http://developer.berlios.de/projects/mrs/|mrs]] ||4-clause BSD ||complex || ||[[http://biosql.org/DIST/|BioSQL]]||LGPLv3|| || Here is an interesting link about tools for biological data: http://biodatamodel.org/ ---- . Back to [[DebianScience/Biology]]