Upstream MEtadata GAthered with YAml (UMEGAYA)
Contents
History
UMEGAYA was an attempt effort to collect meta-information about upstream projects in a file called debian/upstream.metadata in the source packages maintained in a publicly accessible version control system (at that time Subversion or Git). Since this information can be directly accessed from the VCS, the idea was that it could be updated without uploading the source packages to the Debian archive.
This experiment evolved into DEP-12, where the file collecting the meta-informaiton is now called debian/upstream/metadata.
First proof of principle
The DebianMed web sentinels use the UltimateDebianDatabase (UDD) to display bibliographic information about which academic article to cite when using our packages. This was previously done by collecting the information from the central file used to create the med-* metapackages (See the Bits from Debian Pure Blends of October 2012).
A UDD importer was developed, consisting of a gatherer and a UDD module. The current importer is in https://salsa.debian.org/qa/udd/-/blob/master/udd/upstream_reader.py.
The date about bibliographic information is loaded in the bibref table of the UltimateDebianDatabase. The following UDD query outputs all source packages featuring bibliographic information. (The join is needed to exclude those references of packages that are not yet uploaded to Debian package pool but used in so called blends prospective packages.)
SELECT distinct s.source from bibref b join sources s on s.source = b.source;
Syntax
The debian/upstream/metadata file is in YAML format. Its syntax now specified in DEP 12.
Fields
The fields are described in DEP 12.
Discussion
Discussion took place here, on a mailing list (debian-med or debian-qa), or a discussion page, if available.
The data is not really Debian-specific, lets put it outside Debian and use mechanisms for Mapping package names across distributions.
To do: formalise the above using Config::Model::Backend::Yaml, and generate docs as explained in http://ddumont.wordpress.com/2011/04/08/configuration-doc-generation-with-configmodel/ .
* In addition to ?DOAP, other Semantic Web ontologies/namespaces/schemas should be reused in order to not reinvent the wheel, and enable such metadata to participate to the ?Semantic Web (see also Open Linked Data matters). As such, SPDX would be an interesting standard to link to, as well as ADMS.F/OSS, for packages description, IMHO. Syntactically, any form of RDF would be interesting to explicitely convey the prefixes in the field names... and I'm not sure it can be done in ?YAML -- OlivierBerger
BenFinney asks (2019-01-21): Which version of the YAML specification? Please link to the exact YAML specification that rules this format.
Problems
BibTeX: Currently there is no way to let {} that might be used to force capitalisation in BibTeX entries slip through from debian/upstream/metadata into BibTeX
It seems that the python library which is used to parse debian/upstream/metadata files for inclusion into UDD has a bug when values are of the form <d:d> (decimal_number colon decimal_number). You should include strings like this into single quotes. (see Discussion on Debian Med mailing list)
Examples
Here is an example template for a debian/upstream/metadata file which can be used to specify citations:
Reference: Author: <please use full names and separate multiple author by the keyword "and"> Title: Journal: Year: Volume: Number: Pages: DOI: PMID: URL: eprint:
You can find lots of real examples using codesearch.debian.net.
Errors
The most common error is that you are not allowed to use the string ": " inside a yaml value since this is separating key-value pairs. So please quote such values or use a separate line.
If in doubt about the YAML validity about the file you wrote, there are validators available, either on-line Online YAML Parser, or in command line (yamllint).
Lintian check
Simon Kainz has written some preliminary lintian check to verify the syntax of debian/upstream/metadata files (see also 731340). Any testing is welcome. A simple lintian check for YAML syntax was implemented by Petter Reinholdtsen (see 813904). Andrius Merkys is working on validator for values in debian/upstream/metadata files.
Deprecated features
Deprecated fields
According to DEP5 these fields belong to debian/copyright and should not be duplicated in debian/upstream/metadata:
- Name *
- Upstream name of the packaged work.
- Contact
- Which person, mailing list, forum,… to send messages in the first place.
Yet it was objected that these fields must still be allowed, as not all packagers wish to use DEP 5.
Hyphen shortcut for mappings
Only a subset of YAML is used: sequences are only expected to contain scalars and mappings are only expected to contain a scalar or a mapping, but with only one level of imbrication.
In addition, two conventions that are not part of the YAML format were proposed and used in the umaegaya gatherer, but have been abandonned since and are not used anymore:
- Field names are case-insensitive.
- Nested mappings are shortcuts for longer field names composed of both mapping field names separated by a dash. The following two examples are equivalent:
Foo: Bar: baz
Foo-Bar: baz
Other Upstream metadata
Edam files
The EDAM ontology provides some means to classify software used in bioinformatics. The Debian Med team intends to link all bioinformatic tools with the EDAM ontology. To approach this the YAML file debian/upstream/edam can provide extra information.
Fields
- ontology
- EDAM (1.13) (currently version 1.13 is the latest EDAM version)
- topic
- EDAM topic
- scopes
- EDAM scopes
AppStream
AppStream was initially conceived to provide a user-visible app store for the desktop, as such there is overlap in the project metadata, but not all the fields listed above are supported.