Differences between revisions 18 and 19
Revision 18 as of 2010-10-07 19:36:21
Size: 8254
Editor: ?ckk
Comment: spelling/grammar
Revision 19 as of 2010-10-08 06:16:32
Size: 8376
Editor: AndreasTille
Comment: Link to DebTags coordination page
Deletions are marked like this. Additions are marked like this.
Line 72: Line 72:

== DebTags ==

There is a [[DebianScience/ProblemsToWorkOn/DebTags|separate page]] to coordinate the DebTags effort.

Problems to work on in Debian Science

At Debian Science round table on DebConf 10 some problems where discussed which did not found an immediate solution. This pages tries to summarise the discussion to enable further discussion.

BibTeX files

NeuroDebian team (Yaroslav Halchenko & Michael Hanke) proposed a package named debian-bibliography which provides BibTeX bibliography for Debian documents (shipped as /usr/share/bib/debian.bib) and to store references to software packages embedded within debian/copyright machine-readable file. There was much applause to this proposal but there was a question how the source of the BibTeX input should be stored. The suggestion of Andreas Tille was to write a dh_installbibtex helper which extracts the information from any source and moves it right into place. There were three competing suggestions how the data can be stored in the source package:

debian/copyright

NeuroDebian team suggested to use debian/copyright to store BibTeX entries under References field. Audience heavily discussed advantages to put the information into debian/copyright

Pro
BibTeX info more or less connected to the copyright holder; propagated automatically to packages.debian.org (and other places, where copyright file is rendered)
Con

BibTeX is not really copyright info in most of the cases (but might be highly relevant in case of respective "attribution" requested); debian/copyright needs not necessarily be machine parseable - so it is hard to get the information in a structured way

debian/bib

(??) proposed to use a file debian/bib which will be moved to /usr/share/bib/<pkgname>

Pro
Format can be plain BibTeX; simple
Con
Just a new file to process, no clear path how to use this file outside the installed package scope

debian/upstream-metadata.yaml

Pro

The content is actually upstream metadata; BibTeX info could be read in a structured way from appropriate field; UltimateDebianDatabase integration of upstream-metadata.yaml is nearly done and thus propagation to Debian Science tasks pages is not far away

Con
Having structured data (BibTeX format) enclosed in another format data (YAML) seems a bit strange

More fine grained tasks

The BOF also discussed the problem that some tasks in ?DebianScienceBlend and DebianMed have grown quite large and needs probably need to be split to not get lost in the amount of information. That's in fact a real problem and there are three potential solutions:

Make smaller tasks

Probably the simplest way to handle tasks which have grown to a lot of packages is just splitting the task in question. The main argument against such a split is that some packages might fit in all the new tasks and thus need to be mentioned in any of them. The answer was given on the mailing lists several times: There is no reason why a package should be not mentioned in more than one task. We are not doing an exclusive classification. We are providing packages for certain tasks. If a package is useful for more than one task it makes perfectly sense to mention it in all these tasks.

Start a new Blend

Andreas Tille once propagated to make a general Debian Science Blend as a general umbrella for those sciences which do not have enough supporters to run a specific Blend. While Chemistry is covered by ?DebiChem and microbiology is a part of Debian Med other sciences have no dedicated own Blend but remained under this umbrella up to now. While this is perfectly fine it should be considered to split up a Debian Mathematics and a Debian Physics Blend. Both sciences could build more fine grained tasks - several of them might be have quite similar content. The main question is whether there are enough supporters for such an attempt because the success of a Blend heavily depends from the people who are involved. Experience of Debian Med and ?DebiChem people has shown that all maintain a strong conncetion to Debian Science anyway - so there is no real danger to just "loose" the supporters of Debian Mathematics or Debian Physics in the general sciences because there are several common topics (see the other sections on this wiki page here).

Make better use of DebTags and find a better way to visualise DebTags

?DebTags is another way to categorise packages in Debian - there are even people who regard ?DebTags as a "competing" technique to metapackages in Debian. In fact, when installing packages via ept-cache/axi-cache, you have a similar functionality like installing a metapackage. There are some pros and cons of both methods, but this should not be discussed here, and a Blend is not only about installing some metapackages.

If Debian Science manages to define a reasonable set of ?DebTags which might enable a reasonable separation of the packages in one task, there should be ways to visualise this for instance on the tasks pages. For instance, some sectioning according to ?DebTags might come to mind or something like this.

As a sidenote, ?DebTags might be also useful to verify whether all interesting packages are really mentioned in the tasks files. It is a known fact that not all package maintainers of scientific software know about Debian Science or might ignore it intentionally and thus there might be software in Debian which is interesting for some task, but is not in it. So making a ?DebTags based search and compare with the content of the according task could be a good idea.

Pursuit of the above proposals will additionally require at least a review of current ?DebTags and tagging practices for relevant packages. To facilitate the discusssion of design and implementation details for these proposals, a separate subpage ProblemsToWorkOn/DebTags has been created.

Giving credit to upstream

For the moment there are two ways to give upstream some credit on the tasks pages: On one hand we publish popcon data on the other hand we are providing a Registration URL in case such a thing exists.

(FIXME: Other methods were discussed in the BOF - please add here to complete the list

Enable pinning to defined versions of programs

One problem of using software in scientific research is that you sometimes need to create absolutely identical results with the same data and this sometimes is only possible with identical versions of a certain software. There might be two approaches which address this problem.

Providing and using test suites

We should try to convince upstream to provide a reasonable set of test data which can be processed in the package build process and need to reproduce an according result set. This might be approached by calling a (wo be written) dh_runtestsuite (or something like this) which can be switched on and of by some variable (to reduce stress on weak architecture autobuilders) which calls a maintainer provided script debian/<pkg>.testsuite

snapshot.debian.org

We should try to consider http://snapshot.debian.org/ to pin packages to a certain version which is not necessarily available in any current release (neither stable, nor testing or unstable). Even if we would consider http://backports.debian.org/ it would not really help for the problem above because the intend of backports is to provide the latest versions to users of stable but scientists rather want to use older versions of a program in a more recent installation.

This suggestion has a lot of technical implications (will the snapshot version work with installed libraries etc.) and there is no clear suggestion how to technically establish the pinning to a certain version, but for the moment it should be discussed whether this idea makes some sense at all and would really help to solve the problem above.

DebTags

There is a separate page to coordinate the ?DebTags effort.