Title: Extend Debian Package Metadata and add AppStream Features DEP: 11 State: CANDIDATE Date: 2014-08-02 URL: http://wiki.debian.org/DEP-11 Source: http://wiki.debian.org/DEP-11?action=info Drivers: Matthias Klumpp <firstname.lastname@example.org>, Julian Andres Klode <email@example.com>, Michael Vogt <firstname.lastname@example.org> License: GPL-3 Abstract: Proposal to enhance metadata exposed by Debian packages, as well as adding new files to Debian repositories which provide all data required for the cross-distro application manager project AppStream.
This is an updated draft of DEP-11! If you want the old version instead, please look here.
The proposal will soon be rewritten to use YAML as default format to store metadata. Please wait for that to happen, do not rely on information this page provides!
This page contains a proposal how Debian could provide useful package metadata to applications and users and implement an own version of the cross-distribution AppStream specifications so we can get rid of big packages like "app-install-data".
The proposed solution will resolve two problems:
Implement a Debian-style version of the AppStream project metadata to provide information about available applications
- Provide new metadata describing the components a package contains.
AppStream is a cross-distro effort to provide an application manager ("AppStore") for all distributions, with advanced features like ratings & reviews. It uses the well-known Open Collaboration Service API to achieve this. The project should also improve collaboration with upstreams as well as collaboration with other distributions (packages can be compared easily using this metadata - useful for e.g. sharing patches) More (and detailed) info about AppStream can be found at Freedesktop.Distributions.
- Component metadata in binary packages
- Proposal for AppStream support
- The Fedora approach
Component metadata in binary packages
Additions to the "Provides" field
We propose the addition of additional component metadata in binary Debian packages "Provides" fields, which might be referenced by other packages "Suggests" fields. (No other fields should make use of this metadata elements, as they likely are provided by multiple packages and might cause a lot of trouble)
Components are for example shared libraries, pkg-config files, KDE-Plasma-Dataproviders, Fonts, Codecs, Firmware, Perl modules, Python-Modules, Haskell modules, printer drivers, etc. This information can be used by applications to automatically load missing functionality on request. This is already implemented for RPM-based distributions using PackageKit. Applications trying to use PackageKit to search for components on Debian will get very bad results at the moment due to missing metadata. (Currently, we're guessing the components by package names)
Metadata is defined in the form "type<name>", for example:
1 Provides: exec<foobar>, lib<libfoo.so.2>, python3<foobar>, mime<text/x-foobar>
By doing this, people can advise apt-get to install a missing mime-type, or a missing library, Perl module or anything else which is a supported component type. The component metadata types cannot be confused with normal package names, because brackets are not allowed in normal package names. The "Provides" line should only provide reference to components (= resources) this package supports, and not applications, since doing that would result in a large mess.
A list of possible components can be found here. We suggest the following component-types to be supported:
Supported component types
Description: Any executable this package provides in /usr/bin. See (This would solve Bug #638517 too in a nice way.)
Example: exec<nano>, exec<vim>, exec<iceweasel>
Description: The Plasma services (KDE4 desktop) this package provides.
Description: Lists all shared libraries this package provides.
Example: lib<libprojectM.so.2>, lib<libogg.so>, lib<some-gstreamer-plugin.so.1>
Description: A list of all Python2 modules this package provides
Description: A list of all Python3 modules this package provides
Description: The "MimeType" field of the .desktop file, one mime entry per mime type. This component indicates if the program in this package is able to handle the specific mime type.
Description: A list of "modalias" globs representing the hardware types (for example USB, PCI, ACPI, DMI) this package handles. Useful for installing printer drivers or other USB protocol drivers for smartphones, firmware, kernel drivers which are not merged upstream yet or whatever else.
Description: A list of firmware files included in the package, to make it possible to find the right firmware package to install for a given kernel driver. The value is the path below /lib/firmware to the firmware file in question, like the firmware value exported from Linux kernel modules.
Example: firmware<ipw2200-bss.fw>, firmware<vcis/COMpad2.cis>
Other potential components
- per-language components: C/C++ include headers, m4, Perl, Ocaml, Haskel, etc
- documentation: doc-base, man, info
- language support: dictionaries ocr tts stt
- pkg-config files
- graphical stuff: icons themes cursors
- audio themes
- gobject introspection
- shell completion
port numbers (apt-get install port<80>)
Proposal for AppStream support
The AppInfo index file
We propose the addition of a new index file for Application installers such as Software-Center, in order to replace the manually maintained app-install-data package, as that package needs to mirror and scan large portions in order to build, and is thus not very up-to-date.
Moving the meta-data to the server side allows for up-to-date information about the available applications, and should thus be a useful thing. The metadata we need is the information from the .desktop files in /usr/share/ applications/ together with the icons used for those .desktop files.
AppStream suggested the use of an XML file for providing the meta information. As Debian does not use XML anywhere else in the archive, and we do not expect anyone in Debian to like XML, we propose to use a simple YAML-based format for the index.
Here is a list of the fields we propose for this index file.
A block needs to contain a "Package" field and a "Architectures" field, as well as an "Application" and "Name" field. All other fields are optional. (but recommended)
Description: The same as in a Packages file
Description: The same as in a Packages file. Used to associate the components described with exactly one .deb package. (And to display a version for applications)
Description: Contains all architectures this package has been built for (amd64, i386, armel, kfreebsd-*, etc.) Additional field not present in AppStream stuff, included here to uniquely associate a .desktop file with exactly one .deb file. It also makes it possible to exclude metadata if a package is not (yet) present on one architecture, avoiding lots of duplication and wasted disk space.
Description: Name of a .desktop file, should serve as unique identifier In practice, the name of a .desktop file is not always unique, for example packages building optimized and unoptimized versions of the application.
Description: The "Name" field of the .desktop file
Description: Localized Name, The "Name[lang]" field of the .desktop file
Description: The "Comment" field of the .desktop file
Description: Localized Comment, The "Comment[lang]" field of the .desktop file
Description: The "Keywords" field of the .desktop file
Description: Localized Keywords, The "Keywords[lang]" field of the .desktop file
Description: The name of the icon, created from the .desktop "Icon" field This one is a bit more complicated, in case the original icon was a path, we need to rename it to something that is not a path, for example, by replacing separators with underscores
Description: The "Categories" field of the .desktop file, post-processed The "Categories" field of a .desktop file is separated by semicolons, we probably want to use comma here.
Description: As in the packages file This one appears in the original XML specification, we could read it from the packages file, but it probably does no harm to copy it here, so we have all display information in one place.
Location in the Archive
We propose that the indices are provided alongside the packages files, that is in dists/<SUITE>/<COMPONENT>/AppInfo.xy and compressed with whatever ftpmaster wish to compress with. This file will only be downloaded on demand, e.g. the Software Center could download it. On servers this file is not really required.
We also need to store the icons for the applications somewhere and provide e.g. a tarball of those icons as part of the archive. That tarball could be located at something like dists/<SUITE>/<COMPONENT>/Applications-Icons.tar.gz.
The icons in the tarball should probably be 32x32 sized and located in the "icons/32x32" sub-directory, at least according to the AppStream site. We could also deal with other schemes.
Example of Components.yml
--- File: DEP-11 Version: '0.6' Origin: debian-sid --- Type: generic ID: python3-webassets Name: C: webassets Packages: - python3-webassets Provides: python3: - webassets --- Type: generic ID: PackageKit Name: C: PackageKit Packages: - packagekit - libpackagekit-glib2-16 Provides: libraries: - libpackagekit-glib2.so.16 binaries: - pkcon - pkmon --- Type: desktop-app ID: iceweasel.desktop Name: C: Iceweasel Packages: - iceweasel Summary: C: Web browser fr_FR: Navigateur web Description: C: | <p>A webbrowser made by Mozilla.</p> <p>Blahblahblah!</p> Keywords: C: - internet - web - browser fr_FR: - navigateur Icon: stock: web-browser cached: firefox.png Categories: - network - web Url: homepage: http://www.mozilla.com Screenshots: - default: yes caption: C: Application doing Foo source-image: width: 800 height: 600 url: http://www.awesomedistro.example.org/en_US/firefox.desktop/main.png thumbnails: - width: 200 height: 150 url: http://www.awesomedistro.example.org/en_US/firefox.desktop/main-small.png - caption: C: Application doing Bar source-image: width: 800 height: 600 url: http://www.awesomedistro.example.org/en_US/firefox.desktop/config.png thumbnails: - width: 200 height: 150 url: http://www.awesomedistro.example.org/en_US/firefox.desktop/config-small.png ProjectGroup: Mozilla Provides: binaries: - iceweasel mimetypes: - text/html - text/xml - application/xhtml+xml - application/vnd.mozilla.xul+xml - text/mml - application/x-xpinstall - x-scheme-handler/http - x-scheme-handler/https --- Type: generic ID: fw-ipw2x00 Name: C: Ipw2x00 Firmware Packages: - firmware-ipw2x00 Provides: firmware: - ipw2200-bss.fw --- Type: generic ID: foo Name: C: Foo Packages: - foo2 Provides: dbus: - type: system service: org.example.foo --- ...
Apt / tools which process DEB packages
Most stuff is already done for Apt as part of a GSoC-2012 project to make it support more metadata. It still has to be discussed to allow "<" and ">" in a packages "Provides" field in order to support component metadata.
All other tools might need adjustments if they parse the "Provides" field. It still needs to be checked which programs are affected. In any case, the changes to support this will be minimal. (in most cases, tools can simply ignore all dependencies which contain an "<")
In order to not have every packager maintain the metadata on his own, we suggest auto-generating the data using a small debhelper script (dh_components, dh_metadata, ...) and adding it to the package. This will add the missing data quickly.
Implementation hints for ftpmaster (building of AppInfo file)
This only touches the ?AppStream data, because this is the data we have to generate on the server. ftpmaster might want to look at app-install-data's extractor code (for example, on git) for hints how icons can be located and named. All data can be generated by one script and be updated per-package, so the use of resources should be low. Much information can already be generated by just scanning the Contents.tar.gz file.
The data from the ComponentMetadata file will be used to generate a Xapian database, like the AppStream code does. The only difference is that we won't generate the Xapian DB from XML files. This Xapian database will then be queried by the Software Center or another Application Managament Tool. The component information will be fetched via apt and be available for PackageKit and APTDaemon, so other applications can make use of it.
The Fedora approach
In Fedora a similar feature is implemended using the Provides RPM field. Supported MIME types are listed like 'mimehandler(text/plain)' in the list of provides for a given RPM, and provided fonts are listed like 'font(FontName)' in the same field. Similar is done for Perl modules using 'perl(module)', Tex packages using 'tex(packagename)', Ocaml modules using ocaml(modulename), Windows DLLs using mingw32(dllname), Mono using mono(module) and gstreamer feature using gstreamer0.10(feature). This make it possible to install a package using the modules, fonts or mime types provided by a given package. To install a special tex package, a simple 'yum install tex(packagename)' will install it.
More information is available from
Why don't you place component data in an extra file too?
Placing component data directly in packages has many advantages:
- Applications and people can ensure that this data is present, where the additional file in the repo is optional
Every Debian derivatives, even those which are not using dak will have this metadata, which e.g. the AppStream metadata implementation will only be for dak.
- If a program requires a certain module to be installed (e.g. a Python module) and checks for the present of this component using Apt/PackageKit, people can satisfy this dependency by installing a local package. This is not possible with an extra file which only exists in package archives.
- It reduces server load, since the buildds will generate that data. There is less code to be added to dak.
- We don't do useless duplication of data (package-names, descriptions, etc.)
The data is always up-to-date. While the dak AppStream data generator is only run once every two weeks, component data is always available immediately and can be used. (For application info, this delay is no issue, for component metadata it is desirable to have it present immediately)
Couldn't we store the application information in binary packages too?
This is not a good idea. We would have to duplicate that data for every architecture, also everyone would have that data installed, even if he/she is not using Debian with a GUI at all. By splitting the application-info out to a new file, nobody will have to download potentially useless data, and downloading updated repository data will be faster on slow machines. Downloading this data on-demand is the better approach. Also, because we need to process all packages in dak anyway to generate the icon tarball, we can easily extract the other data there too and icons and application data stay in sync. (Which is essential for software centers)
Couldn't you use debtags?
No. Debtags is about tagging packages manually with categories, while DEP-11 contains a proposal to build machine-readable metadata. Debtags was not designed to store DEP-11 metadata and any attempt to add it there would clutter the tags and render debtags completely useless. However, it is absolutely possible to auto-generate Debtags from DEP-11 metadata, for example the AppStream info can be used to auto-tag a package as GUI application or packages with "exec" components present can be marked as program. Also tagging Python/Perl/Ruby/etc. modules automatically is easily possible.
Can we store $arbitrary_upstream_data in DEP-11?
Stuff like upstream VCS, VCS tagging scheme, upstream bug tracker and other upstream information is out of scope for DEP-11. DEP-11 is mostly about auto-generated metadata. However, it might make sense to add information like this to the additional AppStream application information file, if we can find the info somewhere. Maybe providing a DOAP file for the packages might be nice too, but that is not something DEP-11 cares about. Take a look at DEP-12, which was designed to cover that issue.