Differences between revisions 8 and 9
Revision 8 as of 2013-12-15 10:34:03
Size: 14788
Editor: josch
Comment: added comments
Revision 9 as of 2013-12-15 16:09:48
Size: 15787
Editor: ?HelmutGrohne
Comment: address some of josch's comments
Deletions are marked like this. Additions are marked like this.
Line 72: Line 72:
   * helmut: I am aware of no other examples. Of course that doesn't mean there are any. Probably there are more plugin mechanisms (e.g. gstreamer), but they are not as central and do not affect as many users.
Line 73: Line 74:
   * helmut: I used the expansion of package names as a basis. Of course there is e.g. `libpam-doc`, but the number of false positives is low here. An example measure is `aptitude search '?and(?name(libpam-),?architecture(amd64))' | wc -l`.
Line 122: Line 124:
 * Large number of packages need to be changed (see lower bound in "Interpreter issue").
Line 124: Line 127:
 * josch: list as cons the amount of packages which would have to be modified. We know that this kind of change in many packages usually takes half a decade to implement.    * helmut: The co-installed `M-A:same` packages ship exactly the same files. This is permitted by `dpkg`.
Line 137: Line 140:
 * Must be disabled for bootstrapping new architectures to use fakeroot.  * Must be disabled for crossbuilding or bootstrapping the toolchain of a new architecture to use `fakeroot`.
 * Inherits cons from "Convertion to `Arch:any` `M-A:same`".
Line 140: Line 144:
   * helmut: It's not a problem, but it isn't nice either.
Line 141: Line 146:
   * helmut: More research is needed here. It is not that of a problem in measures of disk space but as soon as you install an (embedded) interpreter for a second architecture you effectively duplicate the libraries. Specifically this will worsen the issue with "version-lock" where one architecture is behind in terms of package building. Can you improve the wording of the item with this background?
Line 142: Line 148:
 * josch: as most of bootstrapping is supposedly done natively, why is this an issue? When compiling natively, not more than one arch is used.    * helmut: Yes, but it remains a stumbling block.
Line 159: Line 165:
 * Must be disabled for bootstrapping new architectures to use `fakeroot`.  * Must be disabled for crossbuilding or bootstrapping the toolchain of a new architecture to use `fakeroot`.
Line 162: Line 168:
 * josch: same bootstrapping comments as above
Line 190: Line 195:
 * josch: how many packages would need splitting? TODO: find a methodology for estimating the number of affected packages
Line 219: Line 224:
All of its dependencies are indirect dependencies of either `perl` or `python`. All of its dependencies are indirect dependencies of either `perl` or `python` which are intended to be `M-A:allowed`.

/!\ This page is work in progress and may be inaccurate. /!\ TODO: Integrate into Multiarch/InterpreterProposal.

Proposed changes to the Multiarch Spec

The current multiarch specification is not capable of expressing a few situations that actually occur in the Debian archive. This document is to explain the specific sub-problems and to propose solutions. It is based on discussions that happened during DebConf13.

Issues

This section describes known limitations of the current specification and explains the kind and number of packages affected.

Library extensions

Consider a shared library in one package and an extension to this library in a different package. As soon as the library becomes M-A:same, it can be installed for multiple architectures simultaneously. To extend the library, the extension must be available for the same architectures as is the library. Currently there is no way to express this condition to dpkg. The issue comes in two flavours.

LD_PRELOAD

It is possible to change the behaviour of the C library by exporting the LD_PRELOAD environment variable. A package that uses this technique usually has a shared library and a shell script that sets up LD_PRELOAD to point to its library. A user installing such a packages would expect it to work in a multiarch environment. That means that the shared library should be available in all the architectures the user uses. Currently, such packages are only installed for the native architecture (by default) and are not available for foreign architectures. As of this writing, the only tool that can be used on foreign architectures at all is fakechroot, because it depends on a M-A:same libfakechroot. A user can install libfakechroot for multiple architectures, but this does not happen automatically.

Usually there is no dependency relation between any of these tools and the programs whose behaviour is changed.

Affected packages:

  • cowdancer: /usr/lib/cowdancer/libcowdancer.so

  • eatmydata: /usr/lib/libeatmydata/libeatmydata.so

  • datefudge: /usr/lib/datefudge/<triplet>/datefudge.so

  • devscripts: /usr/lib/devscripts/libvfork.so.0

  • fakechroot: /usr/lib/<triplet>/fakechroot/libfakechroot.so from libfakechroot

  • fakeroot: /usr/lib/<triplet>/libfakeroot/libfakeroot-*.so

  • faketime: /usr/lib/<triplet>/faketime/libfaketime*.so.1 from libfaketime

  • fl-cow: /usr/lib/fl-cow/libflcow.so

  • libc-bin: /lib/x86_64-linux-gnu/libSegFault.so from libc6

  • libroar-compat2: /usr/lib/x86_64-linux-gnu/roaraudio/complibs/*.so

  • postgresql-client-common: /lib/<triplet>/libreadline.so.6 from libreadline6

  • pulseaudio-utils: /usr/lib/<triplet>/pulseaudio/libpulsedsp.so

  • sdate: /usr/lib/libsdate/libsdate.so

  • torsocks: /usr/lib/torsocks/libtorsocks.so

  • tsocks: /usr/lib/libtsocks.so

  • vde2: /usr/lib/vde2/libvdetap.so

Library plugin

A shared library may provide an interface for extension by loading further libraries during runtime. Two examples for this technique are PAM and NSS. PAM modules are loaded dynamically into programs that use libpam for authenticating users. NSS modules are loaded dynamically into programs that use the C library for name or user resolution. In both cases, programs link against libpam0g or libc6 which are both M-A:same. Neither usually express any dependency relation on the modules, so it is possible that the modularized library is installed for multiple architectures, but the configured extension modules are not installed.

Usually there is no dependency relation between these plugins and programs that are affected by linking libpam0g or libc6.

Affected packages:

  • libpam-* (~70)

  • libnss-* (~10)

TODO: update numbers against sid

  • josch: are there more examples than PAM and NSS? Is there a script to list all of them?
    • helmut: I am aware of no other examples. Of course that doesn't mean there are any. Probably there are more plugin mechanisms (e.g. gstreamer), but they are not as central and do not affect as many users.
  • josch: link script to find numbers of affected packages or do the expansions of libpam-* and libnss-* yield no false positives?
    • helmut: I used the expansion of package names as a basis. Of course there is e.g. libpam-doc, but the number of false positives is low here. An example measure is aptitude search '?and(?name(libpam-),?architecture(amd64))' | wc -l.

Interpreter issue

Interpreted languages such as Perl or Python can be extended with architecture dependent as well as architecture independent modules that may interact with each other. This case was envisaged when the current multiarch specification was written. The idea was that interpreters should be marked with M-A:allowed. Then architecture independent modules could have their interpreter dependencies annotated with :any. What has happened instead is that embeddable interpreters are marked with M-A:same effectively allowing it to be available in multiple architectures at the same time. The availability of interpreters as shared libraries renders this dependency annotation with :any unusable. A module using such annotations would introduce architecture boundaries where there are none. A good explanation of the issues including examples given by Guillem Jover. Essentially the usability of an architecture independent module on a particular architecture depends on the availability of all of its recursive dependencies in that architecture. This restriction currently cannot be expressed to the dependency system and therefore all architecture independent modules are considered to have the native architecture.

Affected packages:

  • Java: ~30 (lib*-java)

  • Mono: ~150 (lib*-cil)

  • Perl: ~170 (lib*-perl)

  • Python: ~80 (python-*, python3-*)

  • Ruby: ~30 (lib*-ruby)

  • Others: ~120 (upper bound, many false positives)

TODO: refresh numbers

TODO: link to generation script

Note that not all languages mentioned above can be embedded, but at least Perl, Python and Ruby can. A lower bound on the number of affected packages therefore is ~280.

In all of these cases, there is a dependency path starting in one of the affected architecture independent modules, passing an architecture dependent module, and ending in an interpreter package.

Solutions

The solutions presented here are roughly ordered by ascending complexity.

Conversion to Arch:any M-A:same

The interpreter issue can be mitigated by turning affected architecture independent modules into architecture dependent packages marked with M-A:same.

Solves: interpreter

Pros:

  • The installation size does not change even though more packages are to be installed.
  • This solution does not require any changes to the infrastructure or the package management.

Cons:

  • The mirror and buildd usage grows.
  • This solution is fragile in that the addition of any package can cause existing modules to become affected.
  • Maintainer scripts must be careful with respect to bytecode removal when removing only one architecture of a package.
  • Some package splits are necessary (see "Running architecture" Cons).
  • Large number of packages need to be changed (see lower bound in "Interpreter issue").
  • josch: why does the installation size not change? What am I missing?
    • helmut: The co-installed M-A:same packages ship exactly the same files. This is permitted by dpkg.

Multi-Arch: all

A new value "all" for the Multi-Arch can be added. This value implies the semantics of the value "same". In addition it causes the package to be automatically installed for all native and foreign architectures configured with dpkg.

Solves: library extension, interpreter

Cons:

  • dpkg --add-architecture results in an inconsistent state.

  • Vastly over approximates and causes more packages to be installed than necessary.
  • Must be disabled for crossbuilding or bootstrapping the toolchain of a new architecture to use fakeroot.

  • Inherits cons from "Convertion to Arch:any M-A:same".

  • josch: upon running dpkg --add-architecture the additional packages would have to be retrieved through apt-get -f install - why is that a problem?

    • helmut: It's not a problem, but it isn't nice either.
  • josch: how vast is vastly in a real-life setup? Given that this only applies to python, perl, nss, pam, etc this probably does not take much space wrt to megabytes? What would the maximum value be per added architecture? Systems with space constraints (embedded) do not have more than one arch anyways, no? Even small arm devices have >32G flash nowadays.

    • helmut: More research is needed here. It is not that of a problem in measures of disk space but as soon as you install an (embedded) interpreter for a second architecture you effectively duplicate the libraries. Specifically this will worsen the issue with "version-lock" where one architecture is behind in terms of package building. Can you improve the wording of the item with this background?
  • josch: can the bootstrapping issue not be solved with fake equivs packages?
    • helmut: Yes, but it remains a stumbling block.

Install-same-as header

The basic idea is to tell packages to be installed in all architectures that a given other package is installed for. To that end, a new optional header for binary packages can be added. The value contains one of its dependencies. A package carrying this header must be M-A:same. It is only considered installed if it is installed for all architectures that the listed package is installed for.

In case of libpam0g and libc6 all plugins and LD_PRELOAD libraries would list these packages in the new header. Architecture dependent modules to interpreters would list the respective shared interpreter library package.

Solves: library extension, interpreter

Cons:

  • Installing a foreign interpreter or C library causes all extensions to be installed as well even if not all are needed.
  • Must be disabled for crossbuilding or bootstrapping the toolchain of a new architecture to use fakeroot.

  • josch: same question as above: how much more space (at maximum) would be used per architecture for this option?

Running architectures

The idea behind this approach is very similar to turning relevant packages Architecture: any and M-A:same. The major difference here is that it is done internally to dpkg and applied to every architecture independent package automatically. Instead of considering architecture independent packages to be installed for the native architecture, dpkg tracks set of architectures for which they are considered installed. New operations are added to dpkg to augment or shrink these sets. For instance when removing a dependency of an architecture independent package, the set may need to be shrunk. On the other hand, installing a package can be done without extending architecture sets of other packages. Thus the dpkg state underapproximates the available functionality provided by packages. At a later time one may notice that the dependencies of an architecture independent package are satisfied in another architecture and then extend its corresponding architecture set.

In the context of the interpreter issue, this extension causes architecture independent modules to inherit the architecture information of the interpreters. For this to work with embedded interpreters, modules need to add the embedded interpreters (being M-A:same) as an alternative to their main interpreter dependency (at least in the Perl and Python worlds).

Solves: interpreter

Pros:

  • No changes to the package metadata needed.
  • Many packages need not change at all to benefit.

Cons:

  • Significant changes to dpkg and apt.
  • Some packages need to be split.
    • Each package is considered to be running code of a single architecture.

      When a package contains private modules in both Perl and Python (such as devscripts), the package is only considered to be installed for an architecture if both the Perl and Python interpreters are available for the same architecture, even though this is by no means required by the package. The issue can be avoided by by splitting such packages into M-A:foreign architecture independent packages containing the parts in use by individual interpreters.

TODO: find a methodology for estimating the number of affected packages

Running architectures with purpose tracking

As the title suggests, this is an extension to the "Running architectures" proposal. It addresses the package splitting aspect by tracking multiple architectures for architecture independent packages. Interpreter packages (M-A:allowed) are used as special markers in the dependency tree and induce new sets of architectures. A package (possibly indirectly) depending on e.g. Python is then considered to be installed for the purpose of using it with Python on a set of architectures that may be different from the set of architectures for other purposes. When resolving dependencies, all M-A:allowed packages in the transitive dependency set are considered as purposes. For each purpose, the subset of packages indirectly depending on a purpose is considered. Dependencies are resolved within such a purpose and a set of running architectures is determined. In addition, there is an empty purpose covering all packages. A package's dependencies are considered satisfied if dependencies are satisfied with respect to the empty purpose or if all dependencies indirectly depend on some purpose package and its dependencies are satisfied with respect to all purposes except for the empty purpose. In practise, this means that packages using e.g. both Perl and Python are no longer considered to be installed for a single architecture. Instead the package may be running on a mix of architectures. Unless such a package is M-A:foreign itself, it will not be able to fulfil dependencies on it at all.

Pros: (see also "Running architectures")

  • Can be implemented on top of the "Running architectures" approach if it proves insufficient.
  • Compared to "Running architectures" it reduces the number of packages by not requiring them to be split (e.g. isenkram).

Cons:

  • Even further changes to dpkg and apt.
  • Explosion in state tracking as the dpkg state grows with number of packages times number of M-A:allowed packages times foreign architectures.

  • josch: can you give an actual example (including) which demonstrates your explanations and uses your established terminology? For example when I do grep-dctrl -F Multi-Arch allowed -s Package I only python stuff, cpp, libthrust-dev and perlmagick to be M-A:allowed. Why?

Example: isenkram

The isenkram package currently depends on both Perl modules (e.g. libgnome2-perl) and Python modules (e.g. python-gobject). All of its dependencies are indirect dependencies of either perl or python which are intended to be M-A:allowed. So its dependencies can be partitioned into those part of the perl purpose and those of the python purpose and can be used with different architectures. In the running architectures without purpose tracking proposal this would not be possible and isenkram would need to be split into two packages to make the usage of components explicit.

TODO: specification TODO: implementation plan