Differences between revisions 5 and 6
Revision 5 as of 2010-12-27 12:54:51
Size: 37975
Editor: ?MonicaRamirezArceda
Comment: Remove the draft comment.
Revision 6 as of 2015-04-04 22:11:07
Size: 38135
Editor: ?DavidSinquin
Comment: EDOS tools replaced with Dose Tools
Deletions are marked like this. Additions are marked like this.
Line 26: Line 26:
The thing you see every day in Debian is '''packages''': there are loads of them, we usually say in the order of like 25000 or so. We install and remove packages from our systems, upload new versions of them and so on.  The thing you see every day in Debian is '''packages''': there are loads of them, we usually say in the order of like 25000 or so. We install and remove packages from our systems, upload new versions of them and so on.
Line 52: Line 52:
Recently there are new fields in the output of {{{apt-cache}}}:  Recently there are new fields in the output of {{{apt-cache}}}:
Line 58: Line 58:
The information you see in {{{apt-cache show debtags}}} comes from ''Packages'' files. They are found in Debian mirrors and CDs and acquired by ''apt'' when you do {{{apt-get update}}}.  The information you see in {{{apt-cache show debtags}}} comes from ''Packages'' files. They are found in Debian mirrors and CDs and acquired by ''apt'' when you do {{{apt-get update}}}.
Line 109: Line 109:
Checksums-Sha1:  Checksums-Sha1:
Line 111: Line 111:
Checksums-Sha256:  Checksums-Sha256:
Line 116: Line 116:
Some information, like the package name, version and maintainers, is similar to the binary package information. Some is different:  Some information, like the package name, version and maintainers, is similar to the binary package information. Some is different:
Line 135: Line 135:
You have the opposite header in {{{apt-cache showsrc}}}. For example,  You have the opposite header in {{{apt-cache showsrc}}}. For example,
Line 151: Line 151:
Let's see the '''''Tag:''''' header. It's been introduced to help dealing with a large number of packages.  Let's see the '''''Tag:''''' header. It's been introduced to help dealing with a large number of packages.
Line 155: Line 155:
You can see that the '''''Tag:''''' header has several tags, not just one, but ''Debtags'' is not just ''multiple sections''.  You can see that the '''''Tag:''''' header has several tags, not just one, but ''Debtags'' is not just ''multiple sections''.
Line 161: Line 161:
There are 620 different tags available at the moment, quite a lot. If we didn't have the groups, it'd be really complicated to keep track of them. Each group is a different ''point of view'' from which we look at Debian. This is called '''Faceted Classification''' and the ''Debtags'' simplification of it is described here: [[http://debtags.alioth.debian.org/paper-debtags.html#debtags-theoretical-foundations]]. The theory behind it is fascinating, but I won't go into it now.  There are 620 different tags available at the moment, quite a lot. If we didn't have the groups, it'd be really complicated to keep track of them. Each group is a different ''point of view'' from which we look at Debian. This is called '''Faceted Classification''' and the ''Debtags'' simplification of it is described here: [[http://debtags.alioth.debian.org/paper-debtags.html#debtags-theoretical-foundations]]. The theory behind it is fascinating, but I won't go into it now.
Line 165: Line 165:
See the ''Accessibility Support'' group of tags, ''Biology'', ''Software Development'', ''Games and Amusement'', ''Security'', ''World Wide Web''... They are all examples of how rich is Debian.  See the ''Accessibility Support'' group of tags, ''Biology'', ''Software Development'', ''Games and Amusement'', ''Security'', ''World Wide Web''... They are all examples of how rich is Debian.
Line 177: Line 177:
Debian Developers are of course encouraged to have a look at their packages: in [[http://qa.debian.org/developer.php?login=enrico]] for example, you can find a ''Debtags'' link that takes to a per-developer tagging ''TODO-list'' page. [[http://debtags.alioth.debian.org/todo.html?maint=enrico%40debian.org]] is mine. Oh dear! The interface is telling me off, I should fix some of them...  Debian Developers are of course encouraged to have a look at their packages: in [[http://qa.debian.org/developer.php?login=enrico]] for example, you can find a ''Debtags'' link that takes to a per-developer tagging ''TODO-list'' page. [[http://debtags.alioth.debian.org/todo.html?maint=enrico%40debian.org]] is mine. Oh dear! The interface is telling me off, I should fix some of them...
Line 263: Line 263:
'''{{{goplay}}}''' is a wonderful little program that shows off the ''many different points of view'' idea (thanks to Miriam Ruiz). Here is a screenshot: [[http://www.miriamruiz.es/img/goplay-1.0_screenshot.png]].  '''{{{goplay}}}''' is a wonderful little program that shows off the ''many different points of view'' idea (thanks to Miriam Ruiz). Here is a screenshot: [[http://www.miriamruiz.es/img/goplay-1.0_screenshot.png]].
Line 273: Line 273:
See [[http://popcon.debian.org/]]. 

For example: [[http://qa.debian.org/popcon.php?package=debtags]] shows some statistics of '''how many people have that package installed'''. 

It has all sort of biases, but it's a way to implement a ''sort by popularity'' feature in a package manager. Such feature has not yet happened because there is still no proper way to acquire that information in a Debian system. 
See [[http://popcon.debian.org/]].

For example: [[http://qa.debian.org/popcon.php?package=debtags]] shows some statistics of '''how many people have that package installed'''.

It has all sort of biases, but it's a way to implement a ''sort by popularity'' feature in a package manager. Such feature has not yet happened because there is still no proper way to acquire that information in a Debian system.
Line 282: Line 282:
=== EDOS Debian Weather ===

Another data source, really cute one, the '''EDOS Debian Weather''': [[http://edos.debian.net/weather/]]. It's a research project '''studying package dependencies'''.

They put together some really smart algorithms for checking dependencies, and as a demo they compute '''how ''installable'' Debian is on any given day''':
=== Dose Tools ===

The '''Dose Tools''' : https://qa.debian.org/dose/ have been created as a successor of EDOS Debian Weather. They provide data about impossible-to-install packages (due to dependency and conflict incompatibilities). They also list packages that are obsolete (depending on removed versions of packages) and packages that conflicts at file level (trying to own a file also owned by another package if the two packages can be installed together).

They put together some really smart algorithms for checking dependencies, and as a demo they compute '''how ''installable'' Debian is on any given day''' (https://qa.debian.org/dose/debcheck.html):
Line 294: Line 294:
I wanted them to make an applet with the Debian Weather to add to my panel, but I'm not aware it has been made yet :\
Line 304: Line 302:
}}}  }}}
Line 328: Line 326:
The '''Package Tracking System''' ([[http://packages.qa.debian.org]]) is a tool to '''track everything about a package'''.  The '''Package Tracking System''' ([[http://packages.qa.debian.org]]) is a tool to '''track everything about a package'''.
Line 332: Line 330:
In the bottom left of the page there is a little half hidden box where you can add your e-mail address to be kept ''in the loop'' about many things that happen to the package. The little selection next to the email field has three options:  In the bottom left of the page there is a little half hidden box where you can add your e-mail address to be kept ''in the loop'' about many things that happen to the package. The little selection next to the email field has three options:
Line 336: Line 334:
 * ''opts'' for subscription options.   * ''opts'' for subscription options.
Line 363: Line 361:
For example, [[http://dde.debian.net/dde/q/bts/bynumber/123456]] will give you all available information about Debian bug 123456.  For example, [[http://dde.debian.net/dde/q/bts/bynumber/123456]] will give you all available information about Debian bug 123456.
Line 371: Line 369:
 * Python Pickled objects.   * Python Pickled objects.
Line 375: Line 373:
There is obviously more information about packages: most of it you can find in '''[[UltimateDebianDatabase|UDD]]''' if you know SQL.  There is obviously more information about packages: most of it you can find in '''[[UltimateDebianDatabase|UDD]]''' if you know SQL.
Line 400: Line 398:
or ''Firefox''. We normally have one source package for them but after compiling it, their build system generates lots of different packages because we don't always want to install all of ''Open|Libre Office'', or all the translations of ''Firefox''.  or ''Firefox''. We normally have one source package for them but after compiling it, their build system generates lots of different packages because we don't always want to install all of ''Open|Libre Office'', or all the translations of ''Firefox''.
Line 406: Line 404:
ANSWER: Very good question. Having so many (620) different tags calls for a search system for tags, over the time we put together some interestingly scary smart algorithms to find tags. 

One you can see in {{{axi-cache}}}: if you have it installed, you can run, for example, 
ANSWER: Very good question. Having so many (620) different tags calls for a search system for tags, over the time we put together some interestingly scary smart algorithms to find tags.

One you can see in {{{axi-cache}}}: if you have it installed, you can run, for example,
Line 461: Line 459:
ANSWER: Always looking for volunteers there :) Beware the current procedure is... special. So I'm not too actively advertising the need for volunteers because I'm not sure I feel comfortable asking people to do it the way I do it, and I can't think of any better way that can be quickly put into place.  ANSWER: Always looking for volunteers there :) Beware the current procedure is... special. So I'm not too actively advertising the need for volunteers because I'm not sure I feel comfortable asking people to do it the way I do it, and I can't think of any better way that can be quickly put into place.
Line 479: Line 477:
ANSWER: I believe there is a debtags table in UDD, yes.  ANSWER: I believe there is a debtags table in UDD, yes.
Line 493: Line 491:
I wouldn't know for sure about other characters, at least not without looking up the documentation of Xapian's !TermGenerator and !QueryParser. Talking about !QueryParser documentation, [[http://xapian.org/docs/queryparser.html]] is a good piece of documentation for {{{axi-cache}}}.  I wouldn't know for sure about other characters, at least not without looking up the documentation of Xapian's !TermGenerator and !QueryParser. Talking about !QueryParser documentation, [[http://xapian.org/docs/queryparser.html]] is a good piece of documentation for {{{axi-cache}}}.

Translation(s): none



Debian Package Information

Debian Women IRC Training Session held by Enrico Zini, 16-Dec-2010

This is a tutorial that will bring you on a trip of Debian package information. There is a lot of package information, some we see every day, some we can't even begin to suspect it could possibly ever exist, but is there.

Requirements

In this tutorial it is assumed that:

  • you know the basics of Debian packages: what is a package, install and unistall packages...
  • you understand general command line use

Technical requirements:

Basic package information

The thing you see every day in Debian is packages: there are loads of them, we usually say in the order of like 25000 or so. We install and remove packages from our systems, upload new versions of them and so on.

Binary package information

We're probably all used in seeing package information with apt-cache. For example, to show information about the package debtags we run:

$ apt-cache show debtags

Every package has:

  • A name, the format of the name is defined by the Debian Policy: for example, it cannot contain underscores, but it can contain dashes.

  • A version, with a more interesting format. The policy defines it as well as how to compare two versions, which is a remarkably interesting problem.

  • Dependencies, that package managers, such as apt or software-center, use to decide what is needed for the package to work.

  • Descriptions, which are used by people to decide whether they'd like to install the package or not.

Information about packages is used for many different tasks, some are performed by machines and some by humans. These tasks can be nontrivial: dependency resolution is a complex task (so complex there are research centers devoted to studying the problem, which is great because they hire Debian people :))

Another complex task is to find the packages you need: very often we really really need a package that is in Debian but we don't know how to find it, so a good description is important, not only to find a package, but to evaluate it, and to compare it with its alternatives before installing it, and so on. You're probably familiar with it

There are other interesting things in the output of apt-cache, like how big it is. Maybe nowadays we don't care anymore how big is the software we install on an average desktop, but it does make sense on smaller systems. It'd be nice to have a package manager to be able to compute the space that would be used by a package and all its dependencies, at the moment we don't have that.

Recently there are new fields in the output of apt-cache:

  • Homepage: is a nice field. We can learn more about a package by just visiting its website. It's a simple addition that makes package managers much more useful. It'd be nice to have a system that automatically checks the Homepage: fields for broken links: I'm not aware of it existing yet.

  • Tag: is categories for packages. There are lots of them available for use. A useful thing for Tag: seen together with the package descriptions is that it gives you lots of extra information like what programming language is this written in? what UI toolkit does it use? All this information could be interesting but should really not be in the package descriptions. (See Debtags Section for more details.)

The information you see in apt-cache show debtags comes from Packages files. They are found in Debian mirrors and CDs and acquired by apt when you do apt-get update.

You can see your local copies of Packages files acquired by apt:

$ ls /var/lib/apt/lists/

If you want to find Packages files on mirrors you can go to: http://ftp.debian.org/debian/dists/squeeze/main/binary-armel (for the people who run armel). Every combination of distribution, suite and architecture has a different Packages file. In any computer, apt needs to download at least 2 of them: the one for your architecture and the one for the all architecture. Then, it does some merging and indexing and builds the .bin files in /var/cache/apt that it uses to access the package information efficiently.

The information we have seen so far is about binary packages. A binary package is the one you install in your machine. It's called binary because it's been made ready for use by the computer. It is not the source that you download from the package author: it's been compiled and somehow preinstalled so that it can be unpacked in your system.

Source package information

In Debian we also have Source packages: that is, you can download the source code of any package in Debian.

You find information about the sources of debtags package running:

$ apt-cache showsrc debtags

It doesn't work on all systems: you need to have source entries in /etc/apt/sources.list. Something like:

deb-src http://ftp.uk.debian.org/debian/ sid main

If you have deb-src sources, apt will download Sources files from the mirrors, and make them available for you when you do apt-cache showsrc. In http://ftp.debian.org/debian/dists/squeeze/main/source/ you can see the Sources files in the mirror. You have a different Source file per combination of (distribution, suite), but you have a single source package for all architectures. The source package will be compiled once per architecture to build the various binary packages.

Let's see an example of source package information:

$ apt-cache showsrc debtags
Package: debtags
Binary: debtags
Version: 1.7.11
Priority: optional
Section: admin
Maintainer: Enrico Zini <enrico@debian.org>
Build-Depends: debhelper (>= 7.0.50~), dh-buildinfo, pkg-config, apt, libwibble-dev (>= 0.1.15), libwibble-dev (<< 0.2), libtagcoll2-dev (>= 2.0.4), libtagcoll2-dev (<< 2.1), libept-dev (>= 1.0), libept-dev (<< 2), zlib1g-dev, python-docutils
Architecture: any
Standards-Version: 3.9.0.0
Format: 3.0 (native)
Directory: pool/main/d/debtags
Files:
 b2dba0b15649a84deb8cae7a925cbcd6 1724 debtags_1.7.11.dsc
 8dedeb51909a0ae52c3550a58325b660 1015917 debtags_1.7.11.tar.gz
Homepage: http://debtags.alioth.debian.org
Vcs-Browser: http://git.debian.org/?p=debtags/debtags.git;a=summary
Vcs-Git: git://git.debian.org/debtags/debtags.git
Checksums-Sha1:
 4ca136d4bcef5a3bd5581b9c8215304384b21186 1015917 debtags_1.7.11.tar.gz
Checksums-Sha256:
 ed86f2a799cb351c28fe676b3068acd22755d6fbf266fb20edac13dea612a7bf 1015917 debtags_1.7.11.tar.gz

Some information, like the package name, version and maintainers, is similar to the binary package information. Some is different:

  • Build-Depends: (instead of Depends:) are the binary packages you need to build this source package. They are usually different from Depends: for example you need gcc to compile many packages, but not to run them.

  • Vcs-Browser: and the other Vcs-* tags are another very welcome recent addition. They tell you where you can find the sources of the package in a version control system. Suppose you find a bug in a package, you can use apt-cache showsrc to see where is its code, check it out and start hacking on it.

In the description of binary packages you have the interesting header Source: which doesn't always show, and it tells you what is the name of the source package. It's not always the same: one source package can generate many binary packages. For example:

$ apt-cache show libc6
...
Source: eglibc
...

There is no libc6 source package: libc6 is generated by the eglibc sources. apt tells you that if you want to see the sources of libc6, you need to get the eglibc source package.

The Source: header is omitted when the names of the source and binary packages are the same. apt-cache showsrc libc6 is smart enough to see the Source: header and show you the right source package anyway.

You have the opposite header in apt-cache showsrc. For example,

$ apt-cache showsrc eglibc
...
Binary: libc-bin, libc-dev-bin, glibc-doc, eglibc-source, locales, locales-all, nscd, libc6, libc6-dev, libc6-dbg, libc6-prof, libc6-pic, libc6-udeb, libc6.1, libc6.1-dev, libc6.1-dbg, libc6.1-prof, libc6.1-pic, libc6.1-udeb, libc0.3, libc0.3-dev, libc0.3-dbg, libc0.3-prof, libc0.3-pic, libc0.3-udeb, libc0.1, libc0.1-dev, libc0.1-dbg, libc0.1-prof, libc0.1-pic, libc0.1-udeb, libc6-i386, libc6-dev-i386, libc6-sparc64, libc6-dev-sparc64, libc6-s390x, libc6-dev-s390x, libc6-amd64, libc6-dev-amd64, libc6-powerpc, libc6-dev-powerpc, libc6-ppc64, libc6-dev-ppc64, libc6-mipsn32, libc6-dev-mipsn32, libc6-mips64, libc6-dev-mips64, libc0.1-i386, libc0.1-dev-i386, libc6-sparcv9b, libc6-i686, libc6-xen, libc0.1-i686, libc0.3-i686, libc0.3-xen, libc6.1-alphaev67, libnss-dns-udeb, libnss-files-udeb
...

eglibc is a source package that generates many binary packages :)

So we've seen source packages and binary packages. Maybe it would make sense to count the size of Debian in terms of source packages, but that'd make Debian feel much smaller :-), like only 15000 packages or so, which is still a lot...

Debtags

Let's see the Tag: header. It's been introduced to help dealing with a large number of packages.

In the past there was only the Section: header, which still exists: you also see it in apt-cache show. Section: is limited, in that one package can only be in one section. Would you put Evolution in the mail section or in the gnome section? Both would be appropriate. So we started working on Debtags as a way to have a far better category system.

You can see that the Tag: header has several tags, not just one, but Debtags is not just multiple sections.

Every tag is made of two parts, separated by ::. For example, debtags package is role::program. The first part identifies a group of similar tags, which due to habits in library science is called a facet. So, role:: is the group of all roles one package can have in the system: program, library, documentation, plugin and so on.

There is a fantastic web page to browse all available tags: http://debtags.alioth.debian.org/vocabulary/. You can see all the facets (groups of tags) and, clicking on a facet, you can see all the options for that group (all the tags for that group).

There are 620 different tags available at the moment, quite a lot. If we didn't have the groups, it'd be really complicated to keep track of them. Each group is a different point of view from which we look at Debian. This is called Faceted Classification and the Debtags simplification of it is described here: http://debtags.alioth.debian.org/paper-debtags.html#debtags-theoretical-foundations. The theory behind it is fascinating, but I won't go into it now.

There are many things in the Debtags project that is worth looking into: one of them is the idea of looking at Debian from different points of view. We like to say that Debian is the universal operating system but saying that you can do everything with Debian is not really helpful if somebody has a specific need. So, by using a group of tags we can give examples of what is available for a given field.

See the Accessibility Support group of tags, Biology, Software Development, Games and Amusement, Security, World Wide Web... They are all examples of how rich is Debian.

Debtags is designed so that there are at least 7 packages for each tag. This makes tags very concrete, they really represent some bit of Debian (7 comes from http://en.wikipedia.org/wiki/The_Magical_Number_Seven,_Plus_or_Minus_Two).

Editing Debtags

Obviously, we cannot ask Debian developers to learn how to use 620 tags for their packages. We could ask them to, but we can't expect them to actually do it well.

Also, users can be better taggers than developers, because they can be field experts in a way the developer is not. Every IT person who worked with very specialised customers is well aware of this. It's common to be asked to write, debug or package software that does things that one cannot understand (at least, it happens to me a lot).

So, tagging is done as a wiki-like way: if you go to http://debtags.alioth.debian.org/todo.html you see a list of packages that need tagging. Click on a package and you'll have the tag editor. The editor is a web application that allows anybody to edit the tags of a package. It has interesting features, like it tries to suggest you tags or ways to improve the classification of a package.

Debian Developers are of course encouraged to have a look at their packages: in http://qa.debian.org/developer.php?login=enrico for example, you can find a Debtags link that takes to a per-developer tagging TODO-list page. http://debtags.alioth.debian.org/todo.html?maint=enrico%40debian.org is mine. Oh dear! The interface is telling me off, I should fix some of them...

Note it says things like There is a 95.4% chance that the tag devel::library is missing. It uses the same algorithms used by supermarkets to suggest you products to buy :) but I digress...

Indeed everybody can edit tags: go to http://debtags.alioth.debian.org/edit.html, pick a package and play with it.

There is a When done: [Submit] button that does just that: it saves your edits in the Debtags database and no authentication of any kind, the idea is: you see an issue in the tagging of the package, go there and fix it.

Package information tools

apt-xapian-index

An interesting newish software is apt-xapian-index. It maintains another index of package information in your system, in /var/lib/apt-xapian-index/. It does not replace apt 's index, but it adds to it: it's designed to support higher-level queries.

It cannot be however used for installing packages because it cannot do depedency resolution (apt does that well, why reimplementing it).

axi-cache is a tool that uses apt-xapian-index. For example:

$ axi-cache search image editor
37 results found.
Results 1-20:
100% showfoto - image viewer/editor for KDE
95% isomaster - A graphical CD image editor
88% openshot - Create and edit videos and movies
87% kimagemapeditor - HTML image map editor
87% fotoxx - easy-to-use digital photo editor
86% bluefish - advanced Gtk+ HTML editor
83% bluefish-data - advanced Gtk+ HTML editor (data)
81% bluefish-plugins - advanced Gtk+ HTML editor (plugins)
81% tea - text editor with syntax highlighting & UTF support
81% bluefish-dbg - advanced Gtk+ HTML editor (debugging symbols)
79% xfe - A lightweight file manager for X11
76% fontforge - font editor
76% xpaint - simple paint program for X
73% pixmap - A pixmap editor
73% gimp - The GNU Image Manipulation Program
73% geeqie - image viewer using GTK+
71% geeqie-gps - image viewer using GTK+ (with support for GPS maps)
71% imagej - Image processing program inspired by NIH Image for the Macintosh
70% mypaint - Paint program to be used with Wacom tablets
70% gnome-paint - simple, easy to use paint program for GNOME
More terms: thumbnail bluefish paint edit wizards viewer gtk+
More tags: works-with::image use::editing x11::application role::program interface::x11 works-with::image:raster scope::application
`axi-cache more' will give more results

It will also suggest terms to improve the search, show a little tag cloud of extra tags you could use (text only, so somehow simplified) and it will also do spell checking:

$ axi-cache search firefax
...
Did you mean: firefox ?
...

It has really nice tab completion (dapal being the bash-completion maintainer as well as an extremely helpful fellow). For example:

  • axi-cache search <TAB> will start suggesting you tags

  • axi-cache search image <TAB> will search you image-related keywords, and so on

A really interesting feature of apt-xapian-index is that it can index all sorts of package information, even things that are not found in the Packages file. One can implement more indexing features via plugins.

It's also self-documenting: every indexing run generates an updated version of /var/lib/apt-xapian-index/README which documents what is in the index. So debtags tags are indexed for fast lookup in the apt-xapian-index index. That's why axi-cache can generate tag clouds and suggest tags so quickly. (I want a tag cloud in every graphical package manager! We're almost in 2011!)

Extra information you find in apt-xapian-index:

  • newness of a package

  • GUI menu entries for applications provided by this package (and their icons)

  • translated package descriptions

For example, you can look for all packages that provide an application in a menu entry. I used this feature to implement fuss-launcher (http://www.enricozini.org/2010/debian/fuss-launcher/), which was interesting, because it uses Debian package information to look, not for packages, but for programs to run. Ideally you could write an application launcher that shows, grayed, matching applications that are not installed; then you could ask for information about them, and ask it to install them

All the data is there, indexed and queriable in a very fast way.

newness of a package is a very new feature. In a nutshell, every time apt-xapian-index sees a package that wasn't there before, it takes note of the date. So you could search or sort packages by how recently they appeared in my system, like the New packages view of aptitude, but with history: what was that package that was new last week? There are currently no UIs I know of that use this information, but the data is there, ready to be used.

newness is not information about a package per se, but more like information about a package in a specific system.

Other similar information is is the package installed? or was the package installed automatically or was it explicitly requested by the user?. These you usually find in aptitude or apt.

If you have popularity-contest installed, you get /var/log/popularity-contest with information about when you last used every package in your system. It'd be trivial to write a script that shows you the packages you have installed but never used, using that information (I need a plugin to get that information into apt-xapian-index, so that one can sort packages by when did I last use it in the axi-cache results).

I mentioned apt-xapian-index knows of what applications are provided by a package, even for packages that are not installed: it can do so thanks to the information provided in the app-install-data package, which contains a copy of the .desktop files contained in any package in Debian. It's used to implement find more applications for this menu kind of features.

goplay

goplay is a wonderful little program that shows off the many different points of view idea (thanks to Miriam Ruiz). Here is a screenshot: http://www.miriamruiz.es/img/goplay-1.0_screenshot.png.

It is a program to find packages, but only game packages, it can show screenshots, and allow to filter by game categories. It's a program that does more with less: it hides some information to show only the information that really matters in a given field.

In the package goplay there are also goadmin, golearn, gosafe and goweb, which are similar to goplay but show a different point of view (for example, system administration).

See also ''Sexual Content and Violence Content facets'' question.

Popularity Contest

See http://popcon.debian.org/.

For example: http://qa.debian.org/popcon.php?package=debtags shows some statistics of how many people have that package installed.

It has all sort of biases, but it's a way to implement a sort by popularity feature in a package manager. Such feature has not yet happened because there is still no proper way to acquire that information in a Debian system.

I'd like to have a way to have it done at apt-get update time, maybe with a file in the mirrors next to the Packages file; that would be the proper way to do it, but it would require coordination about several busy people in Debian still, it's in my whishlist of things to maybe tackle at some Debconf.

Dose Tools

The Dose Tools : https://qa.debian.org/dose/ have been created as a successor of EDOS Debian Weather. They provide data about impossible-to-install packages (due to dependency and conflict incompatibilities). They also list packages that are obsolete (depending on removed versions of packages) and packages that conflicts at file level (trying to own a file also owned by another package if the two packages can be installed together).

They put together some really smart algorithms for checking dependencies, and as a demo they compute how installable Debian is on any given day (https://qa.debian.org/dose/debcheck.html):

  • If most packages can be installed fine, they show a sunny icon.
  • If there are so and so packages that are uninstallable due to broken dependencies, they show rain.
  • If there are a lot of broken packages today, maybe because there is some transition mess going on in sid, they show a thunderstorm icon.

So you can check how's the weather like before running dist-upgrade: genius!

apt-file

Another information source: apt-file. You can use it to search the contents of packages.

For example, you hear a friend say ah, you can do that by running foo. You run foo and you get Command not found: what package contains foo?

$ apt-file search foo

will tell you.

It uses the Contents files in the Debian mirrors. If you look at http://ftp.debian.org/debian/dists/squeeze/ you'll see the Contents files. They're very big, as they list the name of every file provided by every package.

In order to run apt-file, you need to run:

 apt-file update

which will download the right Contents files for your system.

If you're in a hurry, you can also use rapt-file, which is also in the apt-file package. The r stands for remote, so if you want to find out what is the package that provides GNU R, and apt-cache search r is not very helpful, you can use:

rapt-file search bin/R

Alternatively, you can use axi-cache search r and it is that smart, it does the right thing.

Package Tracking System

The Package Tracking System (http://packages.qa.debian.org) is a tool to track everything about a package.

If you look for example at http://packages.qa.debian.org/d/debtags.html you'll find a page with the package status and all sorts of links to every possible information available about it.

In the bottom left of the page there is a little half hidden box where you can add your e-mail address to be kept in the loop about many things that happen to the package. The little selection next to the email field has three options:

  • sub for subscribe

  • unsub for unsubscribe

  • opts for subscription options.

Here you have a list of all available subscription options:

List of accepted keywords:
 bts: All bug reports and associated discussions.
 bts-control: Notifications about status changes of bug reports.
 cvs: Commit notices of the VCS repository associated to the package (it's not limited to cvs, it might be anything).
 upload-source: Notifications of new sourceful uploads.
 upload-binary: Notifications of binary-only uploads (uploads of buildd mainly).
 summary: Regular summary mails about the status of the package. Currently, these report on transition into testing, and new upstream versions available.
 contact: Mails from people contacting the maintainer via package@packages.debian.org.
 default: Mails manually sent to the PTS address.
 buildd: Notifications of build failures from build daemons.
 derivatives: Information about changes made to this package by derivatives (e.g. Ubuntu).
 derivatives-bugs: Bug traffic about this package in derivative distributions (e.g. Ubuntu).
 katie-other: Other mails sent by dak (the software running the archive).
 ddtp: Translations of the package's description created by DDTP contributors.

It is a really nice tool. You can get for example a copy of all mails reporting a new bug in a package or a mail with the changelog of every new version of the package uploaded in Debian.

Debian Data Export

Still a bit work in progress, we have Debian Data Export: http://dde.debian.net/dde/ which is a web application to make it easy to download information about Debian packages. It is currently used as the remote backend for rapt-file.

For example, http://dde.debian.net/dde/q/bts/bynumber/123456 will give you all available information about Debian bug 123456.

By default, it shows a page with a bit of documentation, but you can add ?t=FORMAT and it will give you the same information in a format of your choice: for example, http://dde.debian.net/dde/q/bts/bynumber/123456?t=json

http://dde.debian.net/dde/ lists the formats that are available, currently:

  • JSON: The JSON export is interesting, a DDE plugin can become the backend for a Javascript web application.
  • YAML
  • CSV
  • Python Pickled objects.

More tools

There is obviously more information about packages: most of it you can find in UDD if you know SQL.

For example: bug reports, or all sort of information collected by the Debian-QA project. Ok, that's a general idea of information about Debian packages, there is also quite a bit of information about packagers :)

DDPortfolio is a very good index, you can use it to look up everything known about a Debian Developer (people in Front Desk use it quite a bit :)

Conclusion

Personal reflection of mine: we have way more information that we currently show. There is an incredible amount of neat applications that can be built on it.

I hope this trip can inspire more such applications to appear :)

Questions and answers

QUESTION: But (Homepage and Tag fields) were not present before?

ANSWER: Homepage and Tags are recent additions. Recent as in, 2 or 3 years IIRC.

QUESTION: is that (Vcs-*) an upstream source or a debian source?

ANSWER: It's the debian source. There is a difference because often the Debian developers have a version control system where they do the packaging, which is not necessarily the same one used by the software author.

QUESTION: One source package can generate many binary packages.. I don't understand this.

ANSWER: Good question. Think of a source package as the real software you find on the internet, for example, Open Office or Firefox. We normally have one source package for them but after compiling it, their build system generates lots of different packages because we don't always want to install all of Open|Libre Office, or all the translations of Firefox.

So if you run apt-cache search openoffice.org you'll find lots of binary packages, and likely they're all pieces of the single big source.

QUESTION: Is there a command to search for tags? Say like if I search a media-player I do :apt-cache search audio.

ANSWER: Very good question. Having so many (620) different tags calls for a search system for tags, over the time we put together some interestingly scary smart algorithms to find tags.

One you can see in axi-cache: if you have it installed, you can run, for example,

$ axi-cache search --tags image editor

and it will give you a list of tags that could be related to those keywords.

axi-cache comes with the package apt-xapian-index: it is installed by default in many systems, but not all of them. More of that later.

QUESTION: Do you think, in the future, Section: header could disappear (and only use debtags)?

Probably not. Sections will be around for quite a while.

For example Section: oldlibs is used to automatically track packages that need to be ported to newer libraries. There is a big difference between Section: and Tag:: Section: is maintained by ftp-master and Tag: is maintained by developers and users. So Section: is a field that can be used to take important decisions on a package, because its editing is much more controlled. Section: is going to be something that's used to sort of track the state of a package in Debian, and Tag: something used to find a package in Debian. I see them evolving in different directions, although I reckon this is a rather subtle distinction at this stage.

QUESTION: The goplay screenshot shows Sexual Content and Violence Content facets, but I can't find them in the Debtags - Vocabulary Browser, why?

ANSWER: That was an experiment by Miriam, a very big work, actually. debtags allows to have external tag sources, listed in /etc/debtags/sources.list. It will download them and merge them similarly to what apt does with package information.

This can be used to provide tags that Debian cannot maintain in a standard way. For example, many people disagree on the methods to rate a game by violence or sexual content. While I don't feel confident in picking one method and making it Universal by adding the information in vanilla Debtags, I'm very happy to allow the content to be merged to a system if the user wants. http://www.miriamruiz.es/weblog/?p=69 is some information from Miriam about the project.

I personally have't heard news about the game rating project since quite a while and I lost the link to the debtags source to use for it. We had the idea to ship the ratings in a Debian package one can install, and provides the extra bit of configuration for Debtags.

(Someone should chase Miriam up, and maybe offer her help: it was the main example of external tag data that can be optionally included in a Debian system and I'd hate to lose it)

Another example use of external tag sources is to make Debian scale *down* to an organisation. For example, a network of schools can maintain its own tag database with things like school::teacher, school::primary-education, school::science-lab and so on. I think the Fuss project (http://fuss.bz.it/) played with the idea some time ago, but I don't remember if they eventually deployed it. (The Fuss project is a Debian blend for the Italian speaking minority schools in the German speaking area of Italy)

QUESTION: I see the debtag devel::lang:c. Is "lang" a kind of "subfacet"?

ANSWER: Well spotted. I really want to keep the structure of debtags as just 2 levels: facet and tag. We tried trees and gave up because they are extremely difficult to maintain. But sometimes we end up having little groups inside a facet, like in devel::lang:c; it's convenient in that case, but not something I'd like to encourage. So, I don't like to think of "subfacets" or "subtags".

QUESTION : If anybody can add tag, are there no SPAM, or false tag ?

ANSWER: Thanks, good question. SPAM is not an issue, because there is no way to send email or even to enter text contents like advertisement: the only thing you can do is add and remove tags.

There is an issue of quality of course, and possibly vandalism (although if somebody wanted to vandalise debtags, I'd be impressed: there are far more visible and more rewarding things worth messing with :) In case of vandalism, we have daily backups going back since the beginning of the Debtags project: the dataset is small, so backups are cheap :)

The issue is indeed quality. Sometimes people play with the interface by clicking at random and accidentally submit. For ensuring quality, what happens is that all submissions are manually reviewed before entering Debian proper. They are somehow aggregated so that they are easier to review, the review is done by me and dapal.

Big applause to dapal for helping there.

(lots of applauses ;-))

The plan is to design some interface to allow debian maintainers to review submissions for their own packages. Something like people think the tagging of your packages should be changed this way:. But that interface is technically feasible, we have a decently good idea of how to build it, but still needs to be written. I see it happening in a year or so, to give a rough timeframe.

QUESTION: Can you have two facets with more than one tag for a package? (I'm thinknig in works-with-format::)

ANSWER: Yes, you can, indeed another example is the use:: facet, and the fact that a package can have many uses (think a web browser). In fact, any attempt to add restrictions to the way tags can be used has succeeded in showing a sizable number of unexpected corner cases where the rule would need to be broken. Therefore it just makes sense to have no restrictions except common sense.

QUESTION: Are you looking for volunteer to review submissions?

ANSWER: Always looking for volunteers there :) Beware the current procedure is... special. So I'm not too actively advertising the need for volunteers because I'm not sure I feel comfortable asking people to do it the way I do it, and I can't think of any better way that can be quickly put into place.

For that reason I'm very interested in building new "allow people to review" interfaces.

But in the meantime, by all means if you'd like to get your hands dirty in it you'd make me very happy.

QUESTION: Could you give us one example (a name or a link) of algorithm used to implement facets ?

ANSWER: The supermarket suggestion algorithm used to give some tagging suggestions is here: http://www.borgelt.net/apriori.html and http://www.enricozini.org/2007/debtags/axi-query-tags/ has the algorithm used for the smart way of searching tags used in axi-cache search --tags

QUESTION: Is partial tagging better than no tagging, or is it better not to add a few tags to a package if one is not sure it is missing some other tag?

ANSWER: Partial tagging is better than no tagging. The wiki philosophy works: you do your bit, someone else will do their bit. There are special::not-yet-tagged tags in the web interface, removing those means one considers the package acceptably tagged. Worse case you can add some tags but leave it as not yet tagged.

Another interesting bit of the not-yet-tagged tags is that they are used to keep robots away. There are tagging robots that use euristics on package information to decide that some tags could be added, but they only work on packages that have not-yet-tagged tags attached. Only a human would remove the not-yet-tagged tags, so the tagging robots will respect the superior intelligence of humans and stop interfering :) Cool :)

QUESTION: Can we use udd to search for tag like Implemented in C?

ANSWER: I believe there is a debtags table in UDD, yes.

For those who haven't heard it: UltimateDebianDatabase is the page describing UDD, it's the Ultimate Debian Database, a big source of information about Debian.

QUESTION: The xapian in apt-xapian-index has got a meaning? I got problems to remember the name... knowing something about xapian might help.

ANSWER: That is a very good point. It's called Xapian because it's built on the Xapian indexing system http://xapian.org/. Unfortunately I don't know why they chose that name for their project. In hindsight, apt-xapian-index should have had some more memorable name.

The idea was to not require users to install that package explicitly, but to have it as a dependency of high level package managers. For example, goplay depends on apt-xapian-index.

QUESTION: I ran across special signs, where apt-cache failed (often a + sign). Is axi-cache a way out?

ANSWER: Good question. axi-cache delegates most of indexing and query parsing to Xapian, so it boils down to how Xapian treats special signs. it looks like the + sign is handled properly: at least axi-cache search a+ finds the A+ programming language.

I wouldn't know for sure about other characters, at least not without looking up the documentation of Xapian's TermGenerator and QueryParser. Talking about QueryParser documentation, http://xapian.org/docs/queryparser.html is a good piece of documentation for axi-cache.

You can for example do axi-cache search mail AND NOT implemented-in::php

See also