Tagging biomedical packages with EDAM

debian/upstream/edam is a file to allow a formal categorisation of a package with concepts from the EDAM ontology (https://bioportal.bioontology.org/ontologies/EDAM?p=classes).

It is formatted in YAML like the other files in the debian/upstream folder. Use commandline YAML Lint for consistent validation. NB. The commandline version is not as strict as the (ugly) formatting produced by the online version!

For source packages with multiple binary packages that all need different EDAM annotation and/or for which the main binary package is not named like the source package, it is suggested to name the edam file packagename.edam .

The context of this development is the emerging bio.tools database of the European ELIXIR project. A set of scripts for an automated upload a bio.tools-ready description from the Debian database (edam, control, copyright, changelog) has already been implemented and is available:

These link the Debian package archive to entries in bio.tools. Those in search of a particular tool may thus become more quickly aware of Debian-provided binaries. Particularly in biological and medical sciences the confidence to use the same binary as others do, is of a particular value. Alternatives may be web services, but for many of today's high-throughput data, I/O is a bottleneck.

Issues

Format description

Borrowing from the debian/upstream/edam file of the aspiring Debian package condetri, which again borrowed from trimmomatic, the first line identifies the ontology and version the file refers to. Typical for the EDAM ontology the whole package then has a single topic. That topic may have several scopes, but typically there is just one, i.e. a summary such.

---
ontology: EDAM (1.12)
topic:
  - Sequencing
scopes:
  - name: summary
    function:
      - Sequence trimmimg
      - Sequencing quality control
    inputs:
      - data:   Sequence
        formats: [FASTQ]
    outputs:
      - data:   Sequence
        formats: [FASTQ]

For some softwares suites, like for instance EMBOSS, it may be suitable to have several scopes to separate binaries. A scope has functions, with inputs and outputs.

Examples

A series of packages already features an EDAM annotation. You may decide to adopt terms from a similar program as a head start:

This list is not complete.

Tools helping to organise EDAM annotation

wget -O edam_query.sh https://raw.githubusercontent.com/bio-tools/biotoolsConnect/master/DebianMed/edam.sh
chmod +x edam_query.sh
# install postgresql client if not already installed
[ -x /usr/bin/psql ] || sudo apt-get install postgresql-client-9.5
./edam_query.sh

This produces a file named edam.txt with everything Debian today knows about EDAM and more - feels almost like worthy to upload to biotools :)

$ head -n 3 edam.txt  | tail -n 1
 abacas                              | debian       | sid     | main      | 1.3.1                     | abacas                           | http://abacas.sourceforge.net/                                                                                                                                                                                | Algorithm Based Automatic Contiguation of Assembled Sequences                   |  ABACAS is intended to rapidly contiguate (align, order, orientate),                                                      +|                         |                                      |                                                                       |                                                                                                   | 8 / 11 / 168   | 10.1093/bioinformatics/btp347                                       | {"Probes and primers"}                                 | [{"name": "summary", "inputs": [{"data": "Sequence", "formats": ["FASTA"]}], "outputs": [{"data": "Sequence", "formats": ["FASTA"]}], "function": ["PCR primer design"]}]

You can also create json output when calling the script with -j option:

$ ./edam_query.sh -j

This script is actually not intended as a fully qualified tool but rather as an example for an UDD query that can be turned into a tool.

See also