Differences between revisions 34 and 35
Revision 34 as of 2012-01-15 09:14:48
Size: 5744
Comment: How the bibliograhphic data is loaded in the UDD.
Revision 35 as of 2012-01-15 09:18:45
Size: 5745
Comment: Syntax correction.
Deletions are marked like this. Additions are marked like this.
Line 13: Line 13:
The bibliographic data is refreshed daily at http://upstream-metadata.debian.net/for_UDD/biblio.wide via a local cron job, and loaded by the UDD with the [[http://anonscm.debian.org/viewvc/collab-qa/udd/udd/bibref_gatherer.py
|bibref_gatherer]].
The bibliographic data is refreshed daily at http://upstream-metadata.debian.net/for_UDD/biblio.wide via a local cron job, and loaded by the UDD with the [[http://anonscm.debian.org/viewvc/collab-qa/udd/udd/bibref_gatherer.py|bibref_gatherer]] .

Upstream MEtadata GAthered with YAml (UMEGAYA)

Introduction

I am starting an experimental effort to collect meta-information about upstream in a file called debian/upstream-metadata.yaml in the source packages maintained by the DebianMed project. Since these source packages are stored in a subversion repository on Alioth, the information can be updated without uploading the source packages to the Debian archive.

A draft collector system is being implemented on upstream-metadata.debian.net, and its source is available on git.debian.org. The plan is to use it to prepare tables that can be fed to the UltimateDebianDatabase.

Proof of principle (in progress)

To make the DebianMed web sentinels use the UDD, fed from the debian/upstream-metadata.yaml via upstream-metadata.debian.net, to display bibliographic information about which academic article to cite when using our packages. This is currently done by collecting the information in the central file used to create the med-* metapackages.

The bibliographic data is refreshed daily at http://upstream-metadata.debian.net/for_UDD/biblio.wide via a local cron job, and loaded by the UDD with the bibref_gatherer .

Syntax

The debian/upstream-metadata.yaml file is in YAML format. In its simplest form, it looks much like the paragraph format used in Debian control files. Nevertheless, there may be some times unexpected behaviours, for instance field contents that have a colon inside have to be quoted in some cases. In doubt, there are validaters available, like this Online YAML Parser. With the libyaml-perl package installed, the following command can also validate YAML files:

perl -MYAML -e '$/="";  Load(<STDIN>)' < upstream-metadata.yaml

Only a subset of YAML is used: sequences are only expected to contain scalars and mappings are only expected to contain a scalar or a mapping, but with only one level of imbrication.

In addition, two conventions that are not part of the YAML format are used:

  • Field names are case-insensitive.
  • Nested mappings are shortcuts for longer field names composed of both mapping field names separated by a dash. The following two examples are equivalent:

Foo:
  Bar: baz

Foo-Bar: baz

Fields

In alphabetic order. Let's try to use the same vocabulary as in DOAP as much as possible. Fields that are the same as in DOAP are followed by an asterisk.

Archive
When the upstream work is part of a large archive, like CPAN.
Bug-Database
A URL to the list of known bugs for the project.
Bug-Submit
A URL that is the place where new bug reports should be sent.
Contact
Which person, mailing list, forum,… to send messages in the first place.
DOI
This is the digital object identifier of the academic publication describing the packaged work.
Donation
An URL to a donation form (or instructions).
FAQ
An URL to the online FAQ.
Gallery
An URL to a gallery of pictures made with the program (not screenshots).
Name *
Upstream name of the packaged work.
Homepage *
The packaged work's homepage.
PMID

Same as the DOI, but with the ID number in the PubMed database.

Reference

The following fields are used to document the academic publication describing the packaged work, and are usually pasted from ?BibTex references. There is a big issue to solve: what if the Debian package contains more than one work, published in different articles.? Also, some fields that can be used independantly, the DOI and the PubMed ID, have a shorter name that does not start by Reference-.

Reference-Author
Author list.
Reference-Eprint
Hyperlink to the PDF file of the article.
Reference-Journal
Abbreviated journal name.
Reference-Number
Issue number.
Reference-Pages
Article page number(s).
Reference-Title
Article title.
Reference-URL
Hyperlink to the electronic version of the article.
Reference-Volume
Journal volume.
Reference-Year
Year of publication
References
An URL to a upstream page containing more references.
Registration
An URL to a registration form (or instructions).
Repository
URL to a repository containing the upstream sources.
Repository-Browse
An URL to browse the repository containing the upstream sources.
Screenshots

URL to an upstream page containing screenshots (not screenshots.debian.net).

Watch

Currently it contains the main line of debian/watch. It is therefore assumed to be in format version 3. For surveying multiple locations, it could contain a YAML sequence.

Webservice
URL to an web page where the packaged program can also be used.

Discussion

Let's discuss here, on a mailing list (debian-med or debian-qa), or a discussion page, if available.

The data is not really Debian-specific, lets put it outside Debian and use the ?PackageMap to map between Debian package names and the data:

http://lists.debian.org/debian-mentors/2009/11/msg00450.html

To do: formalise the above using ?http://search.cpan.org/~ddumont/Config-Model/lib/Config/Model/Backend/Yaml.pm, and generate docs as explained in http://ddumont.wordpress.com/2011/04/08/configuration-doc-generation-with-configmodel/ .