Differences between revisions 54 and 55
Revision 54 as of 2012-03-28 06:26:49
Size: 7500
Editor: AndreasTille
Comment: Typo
Revision 55 as of 2012-03-28 06:31:57
Size: 7560
Editor: AndreasTille
Comment: Inject the other change Suggested by Charles to allow also sequences of Screenshots
Deletions are marked like this. Additions are marked like this.
Line 101: Line 101:
 Screenshots:: URL to an upstream page containing screenshots (not {{{screenshots.debian.net}}}).  Screenshots:: One or more URLs to upstream pages containing screenshots (not {{{screenshots.debian.net}}}), repesented by a scalar or a sequence of scalars.

Upstream MEtadata GAthered with YAml (UMEGAYA)

Introduction

This is an effort to collect meta-information about upstream in a file called debian/upstream in the source packages maintained in a publicly accessible version control system (VCS), currently Subversion or Git. Since this information is directly accessed from the VCS, it can be updated without uploading the source packages to the Debian archive.

Umegaya is also the name of a draft collector system that is implemented on upstream-metadata.debian.net. Its source is available on git.debian.org. It is used in to feed the data in the UltimateDebianDatabase.

Proof of principle (in progress)

First attempt through a YAML intermediate

To make the DebianMed web sentinels use the UDD, fed from the debian/upstream via upstream-metadata.debian.net, to display bibliographic information about which academic article to cite when using our packages. This is currently done by collecting the information in the central file used to create the med-* metapackages.

The bibliographic data is refreshed daily at http://upstream-metadata.debian.net/for_UDD/biblio.yaml via a local cron job. As specified in config-org.yaml, it is retreived by the script fetch_bibref.sh and loaded in the UDD as triples (package, key, value) with the bibref_gatherer.

Second attempt through a pool of files

The Umegaya instance running at http://upstream-metadata.debian.net is collecting and organising debian/upstream and debian/copyright files as pools. Currently they are pushed manually in the QA team's Subversion repository's directory packages-metadata. An UDD importer is in development.

Syntax

The debian/upstream file is in YAML format. In its simplest form, it looks much like the paragraph format used in Debian control files. Nevertheless, there may be some times unexpected behaviours, for instance field contents that have a colon inside have to be quoted in some cases. In doubt, there are validaters available, like this Online YAML Parser. With the libyaml-libyaml-perl package installed, the following command can also validate YAML files:

perl -MYAML::XS -e '$/="";  Load(<STDIN>)' < upstream

Be careful not to use the plain Perl YAML module as it accepts files with invalid syntax (661700).

Only a subset of YAML is used: sequences are only expected to contain scalars and mappings are only expected to contain a scalar or a mapping, but with only one level of imbrication.

In addition, two conventions that are not part of the YAML format are used:

  • Field names are case-insensitive.
  • Nested mappings are shortcuts for longer field names composed of both mapping field names separated by a dash. The following two examples are equivalent:

Foo:
  Bar: baz

Foo-Bar: baz

Fields

In alphabetic order. Let's try to use the same vocabulary as in DOAP as much as possible. Fields that are the same as in DOAP are followed by an asterisk.

Archive
When the upstream work is part of a large archive, like CPAN.
Bug-Database
A URL to the list of known bugs for the project.
Bug-Submit
A URL that is the place where new bug reports should be sent.
Contact
Which person, mailing list, forum,… to send messages in the first place.
Donation
An URL to a donation form (or instructions).
FAQ
An URL to the online FAQ.
Gallery
An URL to a gallery of pictures made with the program (not screenshots).
Name *
Upstream name of the packaged work.
Homepage *
The packaged work's homepage.
Other-References
An URL to a upstream page containing more references.
Reference
One or more bibliographic references, represented as a mapping or sequence of mappings containing the one or more of the following keys. The values for the keys are always scalars, and the keys that correspond to standard BibTeX entries must provide the same content.
Author
Author list
DOI
This is the digital object identifier of the academic publication describing the packaged work.
Eprint
Hyperlink to the PDF file of the article.
Journal
Abbreviated journal name.
Number
Issue number.
Pages
Article page number(s).
PMID

ID number in the PubMed database.

Title
Article title.
URL
Hyperlink to the electronic version of the article.
Volume
Journal volume.
Year
Year of publication
Registration
An URL to a registration form (or instructions).
Repository
URL to a repository containing the upstream sources.
Repository-Browse
An URL to browse the repository containing the upstream sources.
Screenshots

One or more URLs to upstream pages containing screenshots (not screenshots.debian.net), repesented by a scalar or a sequence of scalars.

Watch

Currently it contains the main line of debian/watch. It is therefore assumed to be in format version 3. For surveying multiple locations, it could contain a YAML sequence.

Webservice
URL to an web page where the packaged program can also be used.

Reserved fields

The following fields are used internally and must not be present in debian/upstream.

YAML-ALL
Used to dump the loaded record.
YAML-URL
Used to override the repository's URL provided by debcheckout.
YAML-REFRESH-DATE

Used to deduce how long umegaya will ignore calls to refresh (to avoid hammering Alioth).

TODO: ignore them safely.

Discussion

Let's discuss here, on a mailing list (debian-med or debian-qa), or a discussion page, if available.

The data is not really Debian-specific, lets put it outside Debian and use the ?PackageMap to map between Debian package names and the data:

http://lists.debian.org/debian-mentors/2009/11/msg00450.html

To do: formalise the above using ?http://search.cpan.org/~ddumont/Config-Model/lib/Config/Model/Backend/Yaml.pm, and generate docs as explained in http://ddumont.wordpress.com/2011/04/08/configuration-doc-generation-with-configmodel/ .

* In addition to ?DOAP, other Semantic Web ontologies/namespaces/schemas should be reused in order to not reinvent the wheel, and enable such metadata to participate to the ?Semantic Web (see also Open Linked Data matters). As such, SPDX would be an interesting standard to link to, as well as ADMS.F/OSS, for packages description, IMHO. Syntactically, any form of RDF would be interesting to explicitely convey the prefixes in the field names... and I'm not sure it can be done in ?YAML -- OlivierBerger

Notes

  • debian/upstream-metadata.yaml was formerly used instead of debian/upstream.