Differences between revisions 118 and 119
Revision 118 as of 2016-10-15 18:54:39
Size: 14399
Editor: BenFinney
Comment: Ask for clarification on some fields.
Revision 119 as of 2016-10-16 02:40:58
Size: 14219
Editor: PaulWise
Comment: fix fixmes
Deletions are marked like this. Additions are marked like this.
Line 39: Line 39:
 Archive:: When the upstream work is part of a large archive, like CPAN. (**FIXME**: what should the value of this field be? A Boolean “true” or “false”?)  Archive:: The name of the large archive that the upstream work is part of, like CPAN.
Line 103: Line 103:
 Registration:: A URL to a registration form (or instructions). (**FIXME**: explain better the purpose of this field. Which registration form, if there are many to choose from? Registering whom, for what? Why is this information in a Debian package?)  Registration:: A URL to a registration form (or instructions). This could be registration of bug reporting accounts, registration for counting/contacting users etc.
Line 109: Line 109:
 Screenshots:: One or more URLs to upstream pages containing screenshots (not {{{screenshots.debian.net}}}), repesented by a scalar or a sequence of scalars.  Screenshots:: One or more URLs to upstream pages containing screenshots (not {{{screenshots.debian.net}}}), represented by a scalar or a sequence of scalars.
Line 119: Line 119:
 Homepage *:: The packaged work's homepage. (**FIXME**: what field should be used instead for this information?)

 Watch:: Currently it contains the main line of {{{debian/watch}}}. It is therefore assumed to be in format version 3. For surveying multiple locations, it could contain a [[http://www.yaml.org/spec/1.2/spec.html#id2760118|YAML sequence]]. (**FIXME**: what field should be used instead for this information?)
 Homepage *:: The packaged work's homepage. Instead of this field, set the Homepage field in `debian/control`.
 Watch:: Currently it contains the main line of {{{debian/watch}}}. It is therefore assumed to be in format version 3. For surveying multiple locations, it could contain a [[http://www.yaml.org/spec/1.2/spec.html#id2760118|YAML sequence]]. Instead of this field, create a `debian/watch` file.

Upstream MEtadata GAthered with YAml (UMEGAYA)

Help! there is a bug that I do not manage to solve by myself. -- Charles

/!\ This proposal is for all packages, not just science packages. Please ignore the fields that do not apply to your package.

Introduction

This is an effort to collect meta-information about upstream projects in a file called debian/upstream/metadata in the source packages maintained in a publicly accessible version control system (VCSug), currently Subversion or Git. Since this information is directly accessed from the VCS, it can be updated without uploading the source packages to the Debian archive.

Umegaya is also the name of a draft collector system that is implemented on http://upstream-metadata.debian.net. Its source is available on git.debian.org and Branchable. It is used in to feed the data in the UltimateDebianDatabase.

Proof of principle

To make the DebianMed web sentinels use the UDD, fed from the debian/upstream/metadata via upstream-metadata.debian.net, to display bibliographic information about which academic article to cite when using our packages. This is currently done by collecting the information in the central file used to create the med-* metapackages. This work was announced in October 2012 in the Bits from Debian Pure Blends.

The Umegaya instance running at http://upstream-metadata.debian.net is collecting and organising debian/upstream/metadata and debian/copyright files as pools. Currently they are pushed daily in the QA team's Subversion repository's directory packages-metadata. A UDD importer consisting of a gatherer and a UDD module is in development.

The date about bibliographic information is loaded in the bibref table of the UltimateDebianDatabase. The following UDD query outputs all source packages featuring bibliographic information. (The join is needed to exclude those references of packages that are not yet uploaded to Debian package pool but used in so called blends prospective packages.)

SELECT distinct s.source from bibref b join sources s on s.source = b.source;

Syntax

This syntax is being formalised as DEP 12.

The debian/upstream/metadata file is in YAML format. In its simplest form, it looks much like the paragraph format used in Debian control files. Nevertheless, there may be sometimes unexpected behaviours, for instance field contents that have a colon inside have to be quoted in some cases. If in doubt, there are validators available, either on-line Online YAML Parser, or in command line (yamllint).

Fields

In alphabetic order. Let's try to use the same vocabulary as in DOAP as much as possible.

Fields that are the same as in DOAP are followed by an asterisk (“*”) after the field name.

Archive
The name of the large archive that the upstream work is part of, like CPAN.
ASCL-Id

Identification code in the Astrophysics Source Code Library

Bug-Database
A URL to the list of known bugs for the project.
Bug-Submit
A URL that is the place where new bug reports should be sent.
Cite-As

The way the authors want their software be cited in publications. The value is a string which might contain a link in valid HTML syntax. (see discussion on Debian Science list)

Changelog
URL to the upstream changelog.
Contact
Which person, mailing list, forum,… to send messages in the first place.
CPE

One or more space separated Common Platform Enumerator values useful to look up relevant CVEs in the National Vulnerability database and other CVE sources. See CPEtagPackagesDep for information on how this information can be used. Example: "cpe:/a:ethereal_group:ethereal"

Donation
A URL to a donation form (or instructions).
FAQ
A URL to the online FAQ.
Funding
One or more sources of funding which have supported this project (e.g. NSF OCI-12345).
Gallery
A URL to a gallery of pictures made with the program (not screenshots).
Name *
Upstream name of the packaged work.
Other-References
A URL to a upstream page containing more references.
Reference
One or more bibliographic references, represented as a mapping or sequence of mappings containing the one or more of the following keys. The values for the keys are always scalars, and the keys that correspond to standard BibTeX entries must provide the same content.
Author

Author list in BibTeX friendly syntax (separating multiple authors by the keyword "and" and using as few as possible abbreviations in the names, as proposed in http://nwalsh.com/tex/texhelp/bibtx-23.html).

Booktitle
Title of the book the article is published in
DOI
This is the digital object identifier of the academic publication describing the packaged work.
Editor
Editor of the book the article is published in
Eprint
Hyperlink to the PDF file of the article.
ISBN
International Standard Book Number of the book if the article is part of the book or the reference is a book
ISSN
International Standard Serial Number of the periodical publication if the article is part of a series
Journal
Abbreviated journal name [To be discussed: which standard to recommend ?].
Number
Issue number.
Pages
Article page number(s). [To be discussed] Page number separator must be a single ASCII hyphen. What do we do with condensed notations like 401-10 ?
PMID

ID number in the PubMed database.

Title
Article title.
Type

A BibTeX entry type indicating what is cited. Typical values are article, book, or inproceedings. [To be discussed]. In case this field is not present, article is assumed.

URL
Hyperlink to the abstract of the article. This should not point to the full version because this is specified by Eprint. Please also do not drop links to pubmed here because this would be redundant to PMID.
Volume
Journal volume.
Year
Year of publication
Debian-package
Optional: citation information can be restricted to some specific binary package of a multi-binary package if the reference is only concerning this package; Note: This is just a proposal and might change in the future
Registration
A URL to a registration form (or instructions). This could be registration of bug reporting accounts, registration for counting/contacting users etc.
Repository
URL to a repository containing the upstream sources.
Repository-Browse
A URL to browse the repository containing the upstream sources.
Screenshots

One or more URLs to upstream pages containing screenshots (not screenshots.debian.net), represented by a scalar or a sequence of scalars.

Security-Contact
Which person, mailing list, forum,… to send security-related messages in the first place.
Webservice
URL to an web page where the packaged program can also be used.

Some fields are present in debian/upstream/metadata files, but have been introduced there only for exploratory purposes, so their use is not recommended in general, especially when their contents duplicate existing information from other packaging files.

Homepage *

The packaged work's homepage. Instead of this field, set the Homepage field in debian/control.

Watch

Currently it contains the main line of debian/watch. It is therefore assumed to be in format version 3. For surveying multiple locations, it could contain a YAML sequence. Instead of this field, create a debian/watch file.

Reserved fields

The following fields are used internally and must not be present in debian/upstream/metadata.

ping
Field to trigger the gatherer, with no output returned.
YAML-ALL
Used to dump the loaded record.
YAML-URL
Used to override the repository's URL provided by debcheckout.
YAML-REFRESH-DATE

Used to deduce how long umegaya will ignore calls to refresh (to avoid hammering Alioth).

TODO: ignore them safely.

Discussion

Let's discuss here, on a mailing list (debian-med or debian-qa), or a discussion page, if available.

The data is not really Debian-specific, lets put it outside Debian and use mechanisms for Mapping package names across distributions.

To do: formalise the above using Config::Model::Backend::Yaml, and generate docs as explained in http://ddumont.wordpress.com/2011/04/08/configuration-doc-generation-with-configmodel/ .

* In addition to ?DOAP, other Semantic Web ontologies/namespaces/schemas should be reused in order to not reinvent the wheel, and enable such metadata to participate to the ?Semantic Web (see also Open Linked Data matters). As such, SPDX would be an interesting standard to link to, as well as ADMS.F/OSS, for packages description, IMHO. Syntactically, any form of RDF would be interesting to explicitely convey the prefixes in the field names... and I'm not sure it can be done in ?YAML -- OlivierBerger

Problems

BibTeX: Currently there is no way to let {} that might be used to force capitalisation in BibTeX entries slip through from debian/upstream/metadata into BibTeX

It seems that the python library which is used to parse debian/upstream/metadata files for inclusion into UDD has a bug when values are of the form <d:d> (decimal_number colon decimal_number). You should include strings like this into single quotes. (see Discussion on Debian Med mailing list)

Template

Here is a template for a debian/upstream/metadata file which can be used to specify citations:

Reference:
  Author: <please use full names and separate multiple author by the keyword "and">
  Title:
  Journal:
  Year:
  Volume:
  Number:
  Pages:
  DOI:
  PMID:
  URL:
  eprint:

Examples

You can find lots of examples using codesearch.debian.net.

Errors

The most common error is that you are not allowed to use the string ": " inside a yaml value since this is separating key-value pairs. So please quote such values or use a separate line.

Lintian check

Simon Kainz has written some preliminary lintian check to verify the syntax of debian/upstream/metadata files (see also 731340). Any testing is welcome. A simple lintian check for YAML syntax was implemented by Petter Reinholdtsen (see 813904).

Deprecated features

Hyphen shortcut for mappings

Only a subset of YAML is used: sequences are only expected to contain scalars and mappings are only expected to contain a scalar or a mapping, but with only one level of imbrication.

In addition, two conventions that are not part of the YAML format are used:

  • Field names are case-insensitive.
  • Nested mappings are shortcuts for longer field names composed of both mapping field names separated by a dash. The following two examples are equivalent:

Foo:
  Bar: baz

Foo-Bar: baz

UDD loading through a YAML intermediate

The bibliographic data was refreshed daily at http://upstream-metadata.debian.net/for_UDD/biblio.yaml (URL not valid anymore) via a local cron job. As specified in config-org.yaml, it was retreived by the script fetch_bibref.sh and loaded in the UDD as triples (package, key, value) with the bibref_gatherer.

Old names for the file debian/upstream/metadata

  • debian/upstream-metadata.yaml was first used and then shortened to debian/upstream. See http://lists.debian.org/debian-devel/2012/01/msg00426.html. Migration from old file name to new file name can be handled by cme fix dpkg once a proper model for DEP-12 is created.

  • debian/upstream was then used until February 2014, where it was replaced by debian/upstream/metadata, so that debian/upstream/ became directory useable by other programs and in particular uscan. See the archive of the debian-devel mailing list for details.

Other Upstream metadata

Edam files

The EDAM ontology provides some means to classify software used in bioinformatics. The Debian Med team intends to link all bioinformatic tools with the EDAM ontology. To approach this the YAML file debian/upstream/edam can provide extra information.

Fields

ontology
EDAM (1.13) (currently version 1.13 is the latest EDAM version)
topic
EDAM topic
scopes
EDAM scopes