Title: Extend Debian Package Metadata and add AppStream Features
DEP: 11
State: CANDIDATE
Date: 2014-08-02
URL: http://wiki.debian.org/DEP-11
Source: http://wiki.debian.org/DEP-11?action=info
Drivers: Matthias Klumpp <mak@debian.org>,
 Julian Andres Klode <jak@debian.org>,
 Michael Vogt <mvo@debian.org>
License: GPL-3
Abstract:
 Proposal to enhance metadata exposed by Debian packages, as
 well as adding new files to Debian repositories which provide
 all data required for the cross-distro application manager
 project AppStream.

This is an updated draft of DEP-11! If you want the old version instead, please look here.

This page is currently outdated and needs a rework, since AppStream and DEP-11 have merged, and DEP-11 today is really only a YAML implementation of the AppStream specification (YAML was demanded by the ftp-masters team, instead of using XML). If you are looking for the current DEP-11-YAML specification, take a look here: freedesktop.org/software/appstream/docs/sect-AppStream-YAML.html

This page contains a proposal how Debian could provide useful package metadata to applications and users and implement an own version of the cross-distribution AppStream specifications so we can get rid of big packages like "app-install-data".

The proposed solution will resolve two problems:

AppStream is a cross-distro effort to provide an application manager ("AppStore") for all distributions, with advanced features like ratings & reviews. It uses the well-known Open Collaboration Service API to achieve this. The project should also improve collaboration with upstreams as well as collaboration with other distributions (packages can be compared easily using this metadata - useful for e.g. sharing patches) More (and detailed) info about AppStream can be found at Freedesktop.Distributions.

Component metadata in binary packages

Additions to the "Provides" field

We propose the addition of additional component metadata in binary Debian packages "Provides" fields, which might be referenced by other packages "Suggests" fields. (No other fields should make use of this metadata elements, as they likely are provided by multiple packages and might cause a lot of trouble)

Components are for example shared libraries, pkg-config files, KDE Plasma-Dataproviders, Fonts, Codecs, Firmware, Perl modules, Python-Modules, Haskell modules, printer drivers, etc. This information can be used by applications to automatically load missing functionality on request. This is already implemented for RPM-based distributions using PackageKit. Applications trying to use PackageKit to search for components on Debian will get very bad results at the moment due to missing metadata. (Currently, we're guessing the components by package names)

Metadata is defined in the form "type<name>", for example:

   1 Provides: exec<foobar>, lib<libfoo.so.2>, python3<foobar>, mime<text/x-foobar>

By doing this, people can advise apt-get to install a missing mime-type, or a missing library, Perl module or anything else which is a supported component type. The component metadata types cannot be confused with normal package names, because brackets are not allowed in normal package names. The "Provides" line should only provide reference to components (= resources) this package supports, and not applications, since doing that would result in a large mess.

A list of possible components can be found here. We suggest the following component-types to be supported:

Supported component types

Type-Name: exec
Description: Any executable this package provides in /usr/bin. See (This would solve Bug #638517 too in a nice way.)
Example: exec<nano>, exec<vim>, exec<iceweasel>

Type-Name: plasma-service
Description: The Plasma services (KDE4 desktop) this package provides.
Example: plasma-service<weather>

Type-Name: lib
Description: Lists all shared libraries this package provides.
Example: lib<libprojectM.so.2>, lib<libogg.so>, lib<some-gstreamer-plugin.so.1>

Type-Name: python2
Description: A list of all Python2 modules this package provides

Type-Name: python3
Description: A list of all Python3 modules this package provides

Type-Name: mime
Description: The "MimeType" field of the .desktop file, one mime entry per mime type. This component indicates if the program in this package is able to handle the specific mime type.
Example: mime<text/plain>

Type-Name: modalias
Description: A list of "modalias" globs representing the hardware types (for example USB, PCI, ACPI, DMI) this package handles. Useful for installing printer drivers or other USB protocol drivers for smartphones, firmware, kernel drivers which are not merged upstream yet or whatever else.

Type-Name: firmware
Description: A list of firmware files included in the package, to make it possible to find the right firmware package to install for a given kernel driver. The value is the path below /lib/firmware to the firmware file in question, like the firmware value exported from Linux kernel modules.
Example: firmware<ipw2200-bss.fw>, firmware<vcis/COMpad2.cis>

Other potential components

Proposal for AppStream support

The AppInfo index file

We propose the addition of a new index file for Application installers such as Software-Center, in order to replace the manually maintained app-install-data package, as that package needs to mirror and scan large portions in order to build, and is thus not very up-to-date.

Moving the meta-data to the server side allows for up-to-date information about the available applications, and should thus be a useful thing. The metadata we need is the information from the .desktop files in /usr/share/ applications/ together with the icons used for those .desktop files.

Proposed format

AppStream suggested the use of an XML file for providing the meta information. As Debian does not use XML anywhere else in the archive, and we do not expect anyone in Debian to like XML, we propose to use a simple YAML-based format for the index.

Here is a list of the fields we propose for this index file.

Fields

A block needs to contain a "Package" field and a "Architectures" field, as well as an "Application" and "Name" field. All other fields are optional. (but recommended)

Field-Name: Package
Description: The same as in a Packages file

Field-Name: Version
Description: The same as in a Packages file. Used to associate the components described with exactly one .deb package. (And to display a version for applications)

Field-Name: Architectures
Description: Contains all architectures this package has been built for (amd64, i386, armel, kfreebsd-*, etc.) Additional field not present in AppStream stuff, included here to uniquely associate a .desktop file with exactly one .deb file. It also makes it possible to exclude metadata if a package is not (yet) present on one architecture, avoiding lots of duplication and wasted disk space.

Field-Name: Application
Description: Name of a .desktop file, should serve as unique identifier In practice, the name of a .desktop file is not always unique, for example packages building optimized and unoptimized versions of the application.

Field-Name: Name
Description: The "Name" field of the .desktop file

Field-Name: Name-<lang>
Description: Localized Name, The "Name[lang]" field of the .desktop file

Field-Name: Summary
Description: The "Comment" field of the .desktop file

Field-Name: Summary-<lang>
Description: Localized Comment, The "Comment[lang]" field of the .desktop file

Field-Name: Keywords
Description: The "Keywords" field of the .desktop file

Field-Name: Keywords-<lang>
Description: Localized Keywords, The "Keywords[lang]" field of the .desktop file

Field-Name: Icon
Description: The name of the icon, created from the .desktop "Icon" field This one is a bit more complicated, in case the original icon was a path, we need to rename it to something that is not a path, for example, by replacing separators with underscores

Field-Name: Categories
Description: The "Categories" field of the .desktop file, post-processed The "Categories" field of a .desktop file is separated by semicolons, we probably want to use comma here.

Field-Name: Homepage
Description: As in the packages file This one appears in the original XML specification, we could read it from the packages file, but it probably does no harm to copy it here, so we have all display information in one place.

Location in the Archive

We propose that the indices are provided alongside the packages files, that is in dists/<SUITE>/<COMPONENT>/AppInfo.xy and compressed with whatever ftpmaster wish to compress with. This file will only be downloaded on demand, e.g. the Software Center could download it. On servers this file is not really required.

Application Icons

We also need to store the icons for the applications somewhere and provide e.g. a tarball of those icons as part of the archive. That tarball could be located at something like dists/<SUITE>/<COMPONENT>/Applications-Icons.tar.gz.

The icons in the tarball should probably be 32x32 sized and located in the "icons/32x32" sub-directory, at least according to the AppStream site. We could also deal with other schemes.

Example of Components.yml

---
File: DEP-11
Version: '0.6'
Origin: debian-sid
---
Type: generic
ID: python3-webassets
Name:
  C: webassets
Packages:
  - python3-webassets
Provides:
  python3:
    - webassets
---
Type: generic
ID: PackageKit
Name:
  C: PackageKit
Packages:
  - packagekit
  - libpackagekit-glib2-16
Provides:
  libraries:
    - libpackagekit-glib2.so.16
  binaries:
    - pkcon
    - pkmon
---
Type: desktop-app
ID: iceweasel.desktop
Name:
  C: Iceweasel
Packages:
  - iceweasel
Summary:
  C: Web browser
  fr_FR: Navigateur web
Description:
  C: |
    <p>A webbrowser made by Mozilla.</p>
    <p>Blahblahblah!</p>
Keywords:
  C:
    - internet
    - web
    - browser
  fr_FR:
    - navigateur
Icon:
  stock: web-browser
  cached: firefox.png
Categories:
  - network
  - web
Url:
  homepage: http://www.mozilla.com
Screenshots:
  - default: yes
    caption:
        C: Application doing Foo
    source-image:
      width: 800
      height: 600
      url: http://www.awesomedistro.example.org/en_US/firefox.desktop/main.png
    thumbnails:
      - width: 200
        height: 150
        url: http://www.awesomedistro.example.org/en_US/firefox.desktop/main-small.png
  - caption:
      C: Application doing Bar
    source-image:
      width: 800
      height: 600
      url: http://www.awesomedistro.example.org/en_US/firefox.desktop/config.png
    thumbnails:
      - width: 200
        height: 150
        url: http://www.awesomedistro.example.org/en_US/firefox.desktop/config-small.png
ProjectGroup: Mozilla
Provides:
  binaries:
    - iceweasel
  mimetypes:
    - text/html
    - text/xml
    - application/xhtml+xml
    - application/vnd.mozilla.xul+xml
    - text/mml
    - application/x-xpinstall
    - x-scheme-handler/http
    - x-scheme-handler/https
---
Type: generic
ID: fw-ipw2x00
Name:
  C: Ipw2x00 Firmware
Packages:
  - firmware-ipw2x00
Provides:
  firmware:
    - ipw2200-bss.fw
---
Type: generic
ID: foo
Name:
  C: Foo
Packages:
  - foo2
Provides:
  dbus:
    - type: system
      service: org.example.foo
---
...

Implementation

Apt / tools which process DEB packages

Most stuff is already done for Apt as part of a GSoC-2012 project to make it support more metadata. It still has to be discussed to allow "<" and ">" in a packages "Provides" field in order to support component metadata.

All other tools might need adjustments if they parse the "Provides" field. It still needs to be checked which programs are affected. In any case, the changes to support this will be minimal. (in most cases, tools can simply ignore all dependencies which contain an "<")

debhelper

In order to not have every packager maintain the metadata on his own, we suggest auto-generating the data using a small debhelper script (dh_components, dh_metadata, ...) and adding it to the package. This will add the missing data quickly.

Implementation hints for ftpmaster (building of AppInfo file)

This only touches the AppStream data, because this is the data we have to generate on the server. ftpmaster might want to look at app-install-data's extractor code (for example, on git) for hints how icons can be located and named. All data can be generated by one script and be updated per-package, so the use of resources should be low. Much information can already be generated by just scanning the Contents.tar.gz file.

Client-side implementation

The data from the ComponentMetadata file will be used to generate a Xapian database, like the AppStream code does. The only difference is that we won't generate the Xapian DB from XML files. This Xapian database will then be queried by the Software Center or another Application Managament Tool. The component information will be fetched via apt and be available for PackageKit and APTDaemon, so other applications can make use of it.

The Fedora approach

In Fedora a similar feature is implemended using the Provides RPM field. Supported MIME types are listed like 'mimehandler(text/plain)' in the list of provides for a given RPM, and provided fonts are listed like 'font(FontName)' in the same field. Similar is done for Perl modules using 'perl(module)', Tex packages using 'tex(packagename)', Ocaml modules using ocaml(modulename), Windows DLLs using mingw32(dllname), Mono using mono(module) and gstreamer feature using gstreamer0.10(feature). This make it possible to install a package using the modules, fonts or mime types provided by a given package. To install a special tex package, a simple 'yum install tex(packagename)' will install it.

More information is available from

FAQs

Why don't you place component data in an extra file too?

Placing component data directly in packages has many advantages:

Couldn't we store the application information in binary packages too?

This is not a good idea. We would have to duplicate that data for every architecture, also everyone would have that data installed, even if he/she is not using Debian with a GUI at all. By splitting the application-info out to a new file, nobody will have to download potentially useless data, and downloading updated repository data will be faster on slow machines. Downloading this data on-demand is the better approach. Also, because we need to process all packages in dak anyway to generate the icon tarball, we can easily extract the other data there too and icons and application data stay in sync. (Which is essential for software centers)

Couldn't you use debtags?

No. Debtags is about tagging packages manually with categories, while DEP-11 contains a proposal to build machine-readable metadata. Debtags was not designed to store DEP-11 metadata and any attempt to add it there would clutter the tags and render debtags completely useless. However, it is absolutely possible to auto-generate Debtags from DEP-11 metadata, for example the AppStream info can be used to auto-tag a package as GUI application or packages with "exec" components present can be marked as program. Also tagging Python/Perl/Ruby/etc. modules automatically is easily possible.

Can we store $arbitrary_upstream_data in DEP-11?

Stuff like upstream VCS, VCS tagging scheme, upstream bug tracker and other upstream information is out of scope for DEP-11. DEP-11 is mostly about auto-generated metadata. However, it might make sense to add information like this to the additional AppStream application information file, if we can find the info somewhere. Maybe providing a DOAP file for the packages might be nice too, but that is not something DEP-11 cares about. Take a look at DEP-12, which was designed to cover that issue.

Comments