Differences between revisions 19 and 20
Revision 19 as of 2024-03-09 17:43:25
Size: 8573
Editor: GuillemJover
Comment: Update status for Static-Built-Using field
Revision 20 as of 2024-03-09 17:48:08
Size: 8820
Editor: GuillemJover
Comment: Add better possible fix for .dsc Format field issue
Deletions are marked like this. Additions are marked like this.
Line 46: Line 46:
 * Conflation of .dsc format and source package format in the '''Format''' field, so changing fields now might not be possible w/o bumping the format version. Possible solutions could be to use the minor version for the .dsc format, or to add a new field, both are not pretty.  * Conflation of .dsc format and source package format in the '''Format''' field, so changing fields now might not be possible w/o bumping the format version. Possible solutions could be to:
  * U
se the minor version for the .dsc format (semantic break for the field)
  * Add a new field (makes its usage inconsistent with other files)
  * Bump the '''Format''' to 4.0
, and specify that in this file format the source format is stored in '''Source-Format''' or similar and untangle them permanently restoring uniformity.

This is an incomplete dump of things that should have probably been done differently, but might either be too late to fix or change now (barring time travel), might require huge transitions and lots of work, or might be too difficult for little apparent gain. Of course this should not be taken as problems that should or will not be fixed, only that they might require a very brave and persistent soul to get to move it forward!

Architecture

  • Missing linux- prefix on Linux-based arches.

  • The ppc64/ppc64el names are inconsistent with the powerpc/powerpcel names.

  • The i386 arch bumps the GNU triplet from time to time, because supposedly packages are keying on it to select the emitted arch baseline. This will be "fixed" once i386 drops out of use. :)

  • Single-word arches not matching what some people expect when using a Debian triplet (any-arm !~ armel), and makes third party implementations harder, than simply considering any a wildcard matching that element of the tuple. See apt bug 748936.

Detailed explanation at https://lists.debian.org/debian-dpkg/2014/06/msg00007.html.

Multi-Arch file refcounting

The file refcounting is something that dpkg performs for Multi-Arch:same packages, so that shared files between different architecture instances do not need to be split into common packages.

There was a big discussion (FIXME link, and a summary of some of the problems in https://lists.debian.org/debian-devel/2012/02/msg00457.html) just before the deployment of the Multi-Arch enabled dpkg in Debian, and the majority of participants deemed the requirements to split packages as too much work, so refcounting got reluctantly reintroduced in the codebase.

But although file refcounting has some nice properties to avoid work for maintainers has also some very nasty ones. Some are even in principle unfixable. In addition backpedaling on that decision would imply quite some work now. The current requirements are:

  • M-A:same packages can only be configured if all of their instances are at their exact same binary version.
  • All refcounted files need to match on their md5sums.

Consequently:

  • Unmatched binNMUs make packages not co-installable.
  • binNMUs in general are by default not co-installable, due to differing changelog entries (part of this has been worked around, see below).
  • Only the last package instance can check that it matches the md5sums of the already installed refcounted files, which means differing files might not get detected.
  • Essential packages (which must work even when only unpacked), might not work at all if one of it's Pre-Depends is a M-A:same shared library that has an unpacked shared file from another instance from a different binary version.

The currently implemented and proposed workaround to some of this problems has been a series of ad-hoc hacks:

  • Split the binNMU changelog entry into a different file, automatically only for packages using debhelper.
  • Hunt down all packages that contain differences depending on the architecture, and try to make them reproducible, but this might just shadow files that might end up changing depending on the program generating them.
  • Switch the binary version coherence check for all instances to be source version based. This mixes up the source and binary versionspaces, and makes it akin to a magic check (although there was code for this, it was deemed not a good solution).

Ideally:

  • To avoid a flag day Helmut Grohne proposed adding a new Multi-Arch field value, with similar semantics as same but implying no refcounting.

  • Split refcounted files into their own common packages.
  • Move at least changelog files into the .deb control area, and consequently to the dpkg db.
    • This would also allow to transparently compress and deduplicate those files, w/o needing to do flaky directory to symlink dances back and forth.
  • At some point in the future, when not needed at all, disable refcounting completely? (breaks compatibility and might not be possible at all, ever).

.dsc

  • Conflation of .dsc format and source package format in the Format field, so changing fields now might not be possible w/o bumping the format version. Possible solutions could be to:

    • Use the minor version for the .dsc format (semantic break for the field)
    • Add a new field (makes its usage inconsistent with other files)
    • Bump the Format to 4.0, and specify that in this file format the source format is stored in Source-Format or similar and untangle them permanently restoring uniformity.

  • Maintainer and Uploaders fields have confusing names.

  • Package-List might have been called Binary-List perhaps, no Binary field.

  • Binary redundant with Binary-List (Package-List).

.deb size limit

The ar container has a hard limit on each member size (around 9536.74 MiB). This cannot be fixed without breaking compatibility with the ar format. Possible solutions could be to:

  • Use some other container format. But this would break detection as a non-deb format. Painful.

  • Split the large ar members into different tar slices (these could be the original data.tar.COMP bytesstream sliced, or independent tar archives). For example: control.tar.xz, data-NN.tar.xz, data-NN.tar.xz, etc. Where NN can be an index from 00 to [0-9a-z]{2}. This would be easy to handle by hand with just cat and dd. This might require either bumping the deb format version to 3.0, or use a different magic member (such as debian-slices or debian-binslice). Candidate.

  • Bump major version to 3, and use an ar container, concatenated with something else (perhaps a PAX archive?). This would be easy to create (ar + other-tool >> archive) and not difficult to extract with standard tools (dd). Non-standard.

  • Bump major version to 3, and use an overlapping ar+PAX container for the first entry. PAX removes all limits imposed by ustar, and ar, and both ar and PAX headers start at offset 0 with the name field. In PAX we get 100 characters to use as we want as the name field is supposed to be ignored, and the complete ar header + the ar magic are 68 bytes long, which means they fit entirely in the PAX name field. So we'd use an uncompressed PAX container with a backwards compatible ar header embedded in the first PAX entry so that other software could detect these archives as being .debs. One very tiny problem with a PAX container is that it has more overhead for an empty package, but that's in the order of few KiB. Proof-of-concept. Crazy?

  • Revert major version to 0.939000, which has no size limit, but has extremely poor support in tools and libraries, and it is not extensible. Compression algorithms other than gzip would need header checks, but as the start of a tarball has a known format, there are no false positives. Non-starter

Build profiles

Defaults

Debian started without build profiles and the philosophy to build source packages with all features enabled. As a result, instead of having positive build profile names (like in Gentoo) which enable features, we must use negative build profile names which disable features. This leads to a double negative for build dependencies that are required to compile a source package with a certain feature:

Build-Depends: foo <!nofoo>

If Debian had the concept of build profiles from the start (in the same way that Gentoo has USE flags) then above could be rewritten into the much more readable and less confusing form:

Build-Depends: foo <foo>

Namespace separator

The package specific namespace separator is «.», but that is a valid character for a package name, so it was not a very good choice. A better separator would have been «/».

Built-Using field

This field was concocted in Debian to track shadow dependencies that are not commonly represented in the usual fields. But its scope was quite reduced to only cover license-required dependencies (such as the ones imposed by some copyleft licenses). Its quite generic name has been a source of confusion, as people have tried to use this same field to also track other related shadow dependencies (that do not stem from license requirements), such as plain static linking dependencies.

There's now a field called Static-Built-Using (supported since dpkg 1.21.3), that should give a better name to the latter case. But that still leaves the original field with a poor name, where it might have been better named something like say License-Built-Using or similar.