Upstream MEtadata GAthered with YAml (UMEGAYA)

History

UMEGAYA was an attempt effort to collect meta-information about upstream projects in a file called debian/upstream.metadata in the source packages maintained in a publicly accessible version control system (at that time Subversion or Git). Since this information can be directly accessed from the VCS, the idea was that it could be updated without uploading the source packages to the Debian archive.

This experiment evolved into DEP-12, where the file collecting the meta-informaiton is now called debian/upstream/metadata.

First proof of principle

The DebianMed web sentinels use the UltimateDebianDatabase (UDD) to display bibliographic information about which academic article to cite when using our packages. This was previously done by collecting the information from the central file used to create the med-* metapackages (See the Bits from Debian Pure Blends of October 2012).

A UDD importer was developed, consisting of a gatherer and a UDD module. The current importer is in https://salsa.debian.org/qa/udd/-/blob/master/udd/upstream_reader.py.

The date about bibliographic information is loaded in the bibref table of the UltimateDebianDatabase. The following UDD query outputs all source packages featuring bibliographic information. (The join is needed to exclude those references of packages that are not yet uploaded to Debian package pool but used in so called blends prospective packages.)

SELECT distinct s.source from bibref b join sources s on s.source = b.source;

Syntax

The debian/upstream/metadata file is in YAML format. Its syntax now specified in DEP 12.

Fields

The fields are described in DEP 12.

Discussion

Discussion took place here, on a mailing list (debian-med or debian-qa), or a discussion page, if available.

The data is not really Debian-specific, lets put it outside Debian and use mechanisms for Mapping package names across distributions.

To do: formalise the above using Config::Model::Backend::Yaml, and generate docs as explained in http://ddumont.wordpress.com/2011/04/08/configuration-doc-generation-with-configmodel/ .

* In addition to ?DOAP, other Semantic Web ontologies/namespaces/schemas should be reused in order to not reinvent the wheel, and enable such metadata to participate to the ?Semantic Web (see also Open Linked Data matters). As such, SPDX would be an interesting standard to link to, as well as ADMS.F/OSS, for packages description, IMHO. Syntactically, any form of RDF would be interesting to explicitely convey the prefixes in the field names... and I'm not sure it can be done in ?YAML -- OlivierBerger

BenFinney asks (2019-01-21): Which version of the YAML specification? Please link to the exact YAML specification that rules this format.

Problems

BibTeX: Currently there is no way to let {} that might be used to force capitalisation in BibTeX entries slip through from debian/upstream/metadata into BibTeX

It seems that the python library which is used to parse debian/upstream/metadata files for inclusion into UDD has a bug when values are of the form <d:d> (decimal_number colon decimal_number). You should include strings like this into single quotes. (see Discussion on Debian Med mailing list)

Examples

Here is an example template for a debian/upstream/metadata file which can be used to specify citations:

Reference:
  Author: <please use full names and separate multiple author by the keyword "and">
  Title:
  Journal:
  Year:
  Volume:
  Number:
  Pages:
  DOI:
  PMID:
  URL:
  eprint:

You can find lots of real examples using codesearch.debian.net.

Errors

The most common error is that you are not allowed to use the string ": " inside a yaml value since this is separating key-value pairs. So please quote such values or use a separate line.

If in doubt about the YAML validity about the file you wrote, there are validators available, either on-line Online YAML Parser, or in command line (yamllint).

Lintian check

Simon Kainz has written some preliminary lintian check to verify the syntax of debian/upstream/metadata files (see also 731340). Any testing is welcome. A simple lintian check for YAML syntax was implemented by Petter Reinholdtsen (see 813904). Andrius Merkys is working on validator for values in debian/upstream/metadata files.

Deprecated features

Deprecated fields

According to DEP5 these fields belong to debian/copyright and should not be duplicated in debian/upstream/metadata:

Name *
Upstream name of the packaged work.
Contact
Which person, mailing list, forum,… to send messages in the first place.

Yet it was objected that these fields must still be allowed, as not all packagers wish to use DEP 5.

Hyphen shortcut for mappings

Only a subset of YAML is used: sequences are only expected to contain scalars and mappings are only expected to contain a scalar or a mapping, but with only one level of imbrication.

In addition, two conventions that are not part of the YAML format were proposed and used in the umaegaya gatherer, but have been abandonned since and are not used anymore:

Foo:
  Bar: baz

Foo-Bar: baz

Other Upstream metadata

Edam files

The EDAM ontology provides some means to classify software used in bioinformatics. The Debian Med team intends to link all bioinformatic tools with the EDAM ontology. To approach this the YAML file debian/upstream/edam can provide extra information.

Fields

ontology
EDAM (1.13) (currently version 1.13 is the latest EDAM version)
topic
EDAM topic
scopes
EDAM scopes

AppStream

AppStream was initially conceived to provide a user-visible app store for the desktop, as such there is overlap in the project metadata, but not all the fields listed above are supported.