Large dataset manager
Mentor: DebianMed
Summary: Download, process, manage and integrate large public datasets to Debian.
Required skills:
- Familiarity with one programming or scripting language.
- Familiarity with Debian packaging.
- Bioinformatics or expertise in another field using large public datasets.
Description:
Large public datasets, like databases for bioinformatics are typically too big and too volatile to fit the traditional source/binary packaging scheme of Debian. There are some programs that are distributed in Debian, like blast and emboss, can index specialised databases, but Debian lacks a tool to install or update the datasets they need and keep their indexing in sync. The Debian Med projects looks for a student interested in the management of local copies of large datasets using the same paradigms as software management in the Debian operating system. We encourage the conception of a tool that is functional with multiple fields of interest (not only biology) and operating systems (not only Debian).
As a starting point or a source of inspiration, the students can have a look to the getData programs with which we are currently exploring the issues of data management.
Please contact us on debian-med@lists.debian.org for applying.