/!\ This page is work in progress and may be inaccurate. /!\ TODO: Integrate into Multiarch/InterpreterProposal.

Proposed changes to the Multiarch Spec

The current multiarch specification is not capable of expressing a few situations that actually occur in the Debian archive. This document is to explain the specific sub-problems and to propose solutions. It is based on discussions that happened during ?DebConf13.

Issues

This section describes known limitations of the current specification and explains the kind and number of packages affected.

Library extensions

Consider a shared library in one package and an extension to this library in a different package. As soon as the library becomes M-A:same, it can be installed for multiple architectures simultaneously. To extend the library, the extension must be available for the same architectures as is the library. Currently there is no way to express this condition to dpkg. The issue comes in two flavours.

LD_PRELOAD

It is possible to change the behaviour of the C library by exporting the LD_PRELOAD environment variable. A package that uses this technique usually has a shared library and a shell script that sets up LD_PRELOAD to point to its library. A user installing such a packages would expect it to work in a multiarch environment. That means that the shared library should be available in all the architectures the user uses. Currently, such packages are only installed for the native architecture (by default) and are not available for foreign architectures. As of this writing, the only tool that can be used on foreign architectures at all is fakechroot, because it depends on a M-A:same libfakechroot. A user can install libfakechroot for multiple architectures, but this does not happen automatically.

Usually there is no dependency relation between any of these tools and the programs whose behaviour is changed.

Affected packages:

Library plugin

A shared library may provide an interface for extension by loading further libraries during runtime. Two examples for this technique are PAM and NSS. PAM modules are loaded dynamically into programs that use libpam for authenticating users. NSS modules are loaded dynamically into programs that use the C library for name or user resolution. In both cases, programs link against libpam0g or libc6 which are both M-A:same. Neither usually express any dependency relation on the modules, so it is possible that the modularized library is installed for multiple architectures, but the configured extension modules are not installed.

Usually there is no dependency relation between these plugins and programs that are affected by linking libpam0g or libc6.

Affected packages:

TODO: update numbers against sid

Interpreter issue

Interpreted languages such as Perl or Python can be extended with architecture dependent as well as architecture independent modules that may interact with each other. This case was envisaged when the current multiarch specification was written. The idea was that executable interpreters should be marked with M-A:allowed. Architecture independent modules or libraries are considered to have a special "native" architecture (the architecture of dpkg). This means that installing an architecture independent module and a foreign interpreter (a non-native architecture) is impossible. Marking the dependency of those modules on the interpreter with the :any is only possible for a subset of modules. Once a module needs both an interpreter and an architecture specific extension to the interpreter it must ensure than the two architectures match. This is not possible with the current specification. In general installing an interpreter in a foreign architecture is currently not possible.

The situation with embedded interpreters is worse. They are loaded into other programs as shared libraries. Being libraries they are marked with M-A:same. When using both an embedded interpreter and an architecture specific interpreter extension, it now depends on the availability of both for the desired architecture. In general installing an embedded interpreter for a foreign architecture currently results in the relevant extensions not being available even though dependencies are satisfied.

A good explanation of the issues including examples given by Guillem Jover. Essentially the usability of an architecture independent module on a particular architecture depends on the availability of all of its recursive dependencies in that architecture. This restriction currently cannot be expressed to the dependency system. The current specification says that all architecture independent modules should be treated as if they had the native architecture.

Affected packages: See http://people.debian.org/~helmutg/multiarch_interpreter.html

Note that not all languages mentioned as affected above can be embedded, but at least Perl, Python and Ruby can. A lower bound on the number of affected packages therefore is ~280.

In all of these cases, there is a dependency path starting in one of the affected architecture independent modules, passing an architecture dependent module, and ending in an interpreter package.

Solutions

The solutions presented here are roughly ordered by ascending complexity.

Conversion to Arch:any M-A:same

The interpreter issue can be mitigated by turning affected architecture independent modules into architecture dependent packages marked with M-A:same.

Solves: interpreter

Pros:

Cons:

Multi-Arch: all

A new value "all" for the Multi-Arch can be added. This value implies the semantics of the value "same". In addition it causes the package to be automatically installed for all native and foreign architectures configured with dpkg.

Solves: library extension, interpreter

Cons:

Install-same-as header

The basic idea is to tell packages to be installed in all architectures that a given other package is installed for. To that end, a new optional header for binary packages can be added. The value contains one of its dependencies. A package carrying this header must be M-A:same. It is only considered installed if it is installed for all architectures that the listed package is installed for.

In case of libpam0g and libc6 all plugins and LD_PRELOAD libraries would list these packages in the new header. Architecture dependent modules to interpreters would list the respective shared interpreter library package.

Solves: library extension, interpreter

Cons:

Running architectures

The idea behind this approach is very similar to turning relevant packages Architecture: any and M-A:same. The major difference here is that it is done internally to dpkg and applied to every architecture independent package automatically. Instead of considering architecture independent packages to be installed for the native architecture, dpkg tracks set of architectures for which they are considered installed. New operations are added to dpkg to augment or shrink these sets. For instance when removing a dependency of an architecture independent package, the set may need to be shrunk. On the other hand, installing a package can be done without extending architecture sets of other packages. Thus the dpkg state underapproximates the available functionality provided by packages. At a later time one may notice that the dependencies of an architecture independent package are satisfied in another architecture and then extend its corresponding architecture set.

In the context of the interpreter issue, this extension causes architecture independent modules to inherit the architecture information of the interpreters. For this to work with embedded interpreters, modules need to add the embedded interpreters (being M-A:same) as an alternative to their main interpreter dependency (at least in the Perl and Python worlds).

Solves: interpreter

Pros:

Cons:

TODO: find a methodology for estimating the number of affected packages

Running architectures with purpose tracking

As the title suggests, this is an extension to the "Running architectures" proposal. It addresses the package splitting aspect by tracking multiple architectures for architecture independent packages. Interpreter packages (M-A:allowed) are used as special markers in the dependency tree and induce new sets of architectures. A package (possibly indirectly) depending on e.g. Python is then considered to be installed for the purpose of using it with Python on a set of architectures that may be different from the set of architectures for other purposes. When resolving dependencies, all M-A:allowed packages in the transitive dependency set are considered as purposes. For each purpose, the subset of packages indirectly depending on a purpose is considered. Dependencies are resolved within such a purpose and a set of running architectures is determined. In addition, there is an empty purpose covering all packages. A package's dependencies are considered satisfied if dependencies are satisfied with respect to the empty purpose or if all dependencies indirectly depend on some purpose package and its dependencies are satisfied with respect to all purposes except for the empty purpose. While the former condition is the same as with the "Running architectures", the latter introduces a new way to satisfy dependencies. In practise, this means that packages using e.g. both Perl and Python are no longer considered to be installed for a single architecture. Instead the package may be running on a mix of architectures. Unless such a package is M-A:foreign itself, it will not be able to fulfil dependencies on it at all.

Pros: (see also "Running architectures")

Cons:

Example: isenkram

The isenkram package currently depends on both Perl modules (e.g. libgnome2-perl) and Python modules (e.g. python-gobject). All of its dependencies are indirect dependencies of either perl or python which are intended to be M-A:allowed. Thus we are in the second case of the dependency satisfaction condition where all dependencies must indirectly depend on some M-A:allowed package. Its dependencies can be partitioned into those part of the perl purpose and those of the python purpose and can be used with different architectures. In the running architectures without purpose tracking proposal this would not be possible and isenkram would need to be split into two packages to make the usage of components explicit.

TODO: specification TODO: implementation plan