Differences between revisions 49 and 51 (spanning 2 versions)
Revision 49 as of 2017-09-06 05:36:48
Size: 21095
Editor: PaulWise
Comment: typos
Revision 51 as of 2018-02-16 01:30:00
Size: 25607
Editor: GuillemJover
Comment: Add an entry for epochs, their purpose and their intended usage
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
#language en
Line 180: Line 181:
== Q: What are version epochs and why and when are they needed? ==

A: An epoch helps handle changes in upstream versioning schemes.

Versions are assumed to always increase going forward, so to be able to reset the version to an earlier number we have epochs. It is a mechanism to be able to cope with the bogus realities in software releasing and distribution. Using epochs has the problem that they invalidate any existing versioned relationship within and beyond the distribution they are deployed into (including local admin changes, and derivatives), as they make those relationships meaningless. These relationships can appear in many places, which can be hard to check (i.e. dependency fields, shlib control files, maintainer scripts, native packages code, etc.).

The cases epochs were designed for:

  * Upstream completely changed its version scheme, '''and''' went backwards (e.g. from 403 to 1.0.1). (Arguably this is a bad practice and upstreams should not be doing this, but when this happens there's no other proper way around it. Extra care must be taken to make sure the new scheme will not end up reusing already released version numbers.)
  * The package maintainer messed up the version scheme (on a versionless upstream, by f.ex. using a date-based version, and not prefixing the version with "0." or "0~"), and then upstream started properly versioning their releases. (Arguably this should have been predicted at packaging time or rejected at archive acceptance time, but once done there's no other proper way around it.)

Some that might be acceptable, with conditions:

  * Upstream updated/corrected its version scheme, '''and''' went partially backwards (e.g. from 1.220 to 1.2.20). The version in the package might still be transformable (thus avoiding the need for an epoch) using the same previous scheme, as long as a new version does not make it impossible (such as with 1.20.1, which would then require an epoch). (Extra care must be taken to make sure the new scheme will not end up reusing already released version numbers.)
  * A source or binary package name is reused (after N distribution releases) to ship some other project with a lower version. (Arguably reusing the same package name to ship some other project is perhaps a questionable practice, but not unheard of.)
  * A source package merged the functionality of another source package and its matching binary package(s), with a higher version, which is intended to exist only for a transitional period. This case should be handled by either bumping the epoch only for the merged binary package(s), which will eventually disappear, or by pinning the higher version for the binary package(s) and appending the source version to it (e.g. merger-source 2.0, old-source 4.0, old-binary 4.0+2.0; merged-source 2.1, old-binary 4.0+2.1, etc.).

And some that are outright wrong:
  
  * A repackaged source tarball (the usual practice is to append something like +ds or +dfsg);
  * A temporary version revert, due to:
    * An accidental upload from the wrong suite, or during an ongoing transition, etc;
    * A detected severe issue after the fact (license, ABI breaks, catastrophic crashes, data loss, etc.);
  (These can be worked around with the 2really1 versioning scheme, which has the same devastating properties as an epoch, but it's at least temporary. Proper solutions to this problem, could be by implementing an archive section prepared to handle downgrades, with a higher package manager pin than the usual areas, or by adding a new field denoting the forced downgrade from a specific version only.)
  * To win the version race against a third-party repository. (This can never end well, as both parties can keep increasing the epoch endlessly, this is a version arms race.)
  * Merging downstream specific epoch bumps. (This means the upstream distribution is bound to the max of all downstream epoch numbers for any package; it'd be fair to say that if a downstream did an uncoordinated bump, they can also own it and keep that as a delta.)

An epoch is both confusing to users and distracting as it clutters the version string, and technically problematic, and it's a permanent stigma denoting that someone along the chain messed up. In the few situations for which it was designed for, it's the best we can do (a necessary evil), but for the other cases they should either never be used at all or at least without very strongly considering other alternatives!

User Questions

Q: What can be done when the dpkg lock is held?

A: This means another instance of dpkg or a frontend is running, so the correct solutions are to:

  • let the other process end its execution,
  • find the frontend and quit it, or
  • kill any such frozen processes.

Removing the dpkg lock is not a correct solution, as dpkg uses region locking on an existing file. The presence of the file does not mean that the lock is currently being held, and removing it can most probably cause dpkg database corruption.

Q: What can be done when dpkg fails due to a broken maintainer script?

A: Report the problem to the package maintainer, and upgrade to the fixed version.

Even if what you wanted or needed was to remove the package, you should always upgrade to the fixed version and then remove it.

Trying to edit the dpkg database or using an hypothetical --force option (8639) are never the right solution, as that might leave cruft behind at best or a broken system at worst. They might be convenient or necessary, but never correct.

Q: What are the filesystem requirements by dpkg?

A: The filesystems are expected to have POSIX semantics.

dpkg expects to operate on filesystems that support at least:

  • hardlinks (163988, 825945),

  • symlinks,
  • returning the symlink length in stat(2) st_size,

  • file-region locking (fcntl(2) with F_SETLK; 134591, 149491),

  • replacing files currently open (and being executed) (with rename(2)).

  • renaming directories atomically inside the same parent directory (with rename(2))

As such, filesystems like FAT32 or AFS are not supported for dpkg managed content.

If you are installing on a NFS root, and locking does not work, make sure rpc.lockd (or its equivalent) is running.

If you need a FAT32 partition to boot with EFI for example, then you should use something like /boot/efi/ to mount the FAT32 partition there, which will not be managed by dpkg, and use kernel postinst hooks to copy any required files, in a similar way as would be done to create an initramfs.

Q: Why is dpkg so slow when using new filesystems such as btrfs or ext4?

A: To guarantee that the filesystem data is always consistent and safe, dpkg performs fsync(2)s on its database and files unpacked from packages. Newer filesystems (like btrfs or ext4) that implement delayed allocation do require those fsync(2)s as they trade data safety for performance, and expect programs to performs those fsync(2)s, but at the same time they have shown very poor performance on the behaviour they require from all applications, or they might end up producing zero-length files on system crashes or abrupt shutdowns.

Here's a list of available solutions and workarounds you might consider using:

  • For ext4, use instead the "nodelalloc" mount option, which should fix both the performance degradation and the data safety issues, and not for just dpkg, but for any application in the system.
  • Switch to another filesystem entirely that does not force choosing between acceptable performance and data safety.
  • Use the dpkg --force-unsafe-io option, which avoids those fsync(2)s on filesystem files, but still performs them on the dpkg database. This option might improve performance at the cost of losing data, use with care.

  • If --force-unsafe-io is still not good enough, you might consider using (in addition or instead) eatmydata, which will get rid of any fsync(2) at all. You might want to have backups around, though.

(See 430958, 567089, 578635, 584254, 588254, 588339, 595927, 600075, 605009, 635993 and https://bugzilla.kernel.org/show_bug.cgi?id=15910 for all the gory details.)

Q: Why does ''dpkg --set-selections'' not record selections for unknown packages?

A: Due to multiarch, the dpkg command-line interface dealing with package name input needed to change. To avoid the possibility for dpkg to be unable to refer to packages in its own database, selections for unknown packages are not recorded any longer. So the available database needs to be up-to-date before any selection is set, this also gives the benefit of warning on unknown packages so that the user knows what will be missing. There are several ways to update the available database, depending on the package manager frontend being used, any single one of them will be enough:

  • apt-based (apt, aptitude, synaptic, etc):

    • (from apt, with dpkg 1.17.7 and later)

          apt-cache dumpavail | dpkg --merge-avail

      (from apt, with dpkg 1.17.6 and earlier)

          avail=`mktemp`
          apt-cache dumpavail >"$avail"
          dpkg --merge-avail "$avail"
          rm "$avail"

      (from apt)

          /usr/lib/dpkg/methods/apt/update /var/lib/dpkg/

      (from dctrl-tools)

          sync-available

      (from dselect, using the apt access method)

          dselect update
  • cupt:

    • (from cupt, with dpkg 1.17.7 and later)

          cupt show -a '*' | egrep -v '^(Status|[^:]*\s.*:)' | dpkg --merge-avail

      (from cupt, with dpkg 1.17.6 and earlier)

          avail=`mktemp`
          cupt show -a '*' | egrep -v '^(Status|[^:]*\s.*:)' > "$avail"
          dpkg --merge-avail "$avail"
          rm "$avail"
  • dselect

    • (from dselect, using any of the builtin access methods)

          dselect update

Q: What is the latest dpkg version on each Debian release?

A: Starting with Debian 4.0, each Debian release starts a new dpkg minor version cycle. Take into account that these are not the versions initially shipped on each of those releases, as subsequent updates for security or serious bugs have possibly been issued. This is the current list:

  • Debian release

    Debian version

    dpkg version

    0.93R6

    1.0.0

    buzz

    1.1

    1.2.6

    rex

    1.2

    1.4.0.5

    bo

    1.3.1

    1.4.0.8

    hamm

    2.0

    1.4.0.23.2

    slink

    2.1

    1.4.0.35

    potato

    2.2

    1.6.15

    woody

    3.0

    1.9.21

    sarge

    3.1

    1.10.28

    etch

    4.0

    1.13.26

    lenny

    5.0

    1.14.31

    squeeze

    6.0

    1.15.12

    wheezy

    7

    1.16.18

    jessie

    8

    1.17.27 (legacy)

    stretch

    9

    1.18.24 (stable)

    buster

    10

    1.19.x (devel)

    bullseye

    11

    1.20.x

Behaviour Questions

A: No. As dpkg does not currently track file metadata, it does not know if a symlink or directory was switched by a previous package or by the sysadmin. As part of the dpkg credo, preserving human configuration is of utmost importance, and this kind of change has always been considered as so.

As of now, this should be handled by the package maintainer scripts (possibly with aid from dpkg-maintscript-helper), failing to do so should be considered a serious bug.

Q: Can dpkg be told to avoid invoking a harmful prerm from an installed package on upgrade?

A: No. There's currently no way to tell dpkg not to execute such prerm maintainer scripts, but there is an extreme workaround available.

Only if the prerm is doing something really harmful, not just for simple failures or actions that can be easily reverted by the new postinst, then the only available way is to make a new, independent package that replaces the broken package's prerm, and then make the new version of the broken package pre-depend on it. This requires modifying the dpkg database, and as such is strongly discouraged, and should only be used as a last resort measure. Proposed by Joey Hess.

Q: Can dpkg handle volatile files?

A: Partially. dpkg does not have full native support (yet) for volatile files (those that might change during the life of a specific installed version), but there are some workarounds that can be used for now:

  1. Ship an initial file (possibly empty) on the .deb that gets modified on the filesystem later on, but this will make the file hashes not match with stuff like the new «dpkg --verify» or debsums. While this might be the most convenient option, it's also possibly the most wrong one. This should never be used for volatile files such as logs or caches or the like.

  2. Do not ship the file in the .deb, and generate it at installation time, run-time or whenever, but the maintainer scripts will need to deal with the removal manually. This is the more correct but slightly more cumbersome option.
  3. If the volatile file changes are due to external forces, like updates from a website or similar, then it might be best to automatically generate a new package every time these files change, and ship them normally in a package from a repository.

In the future, dpkg might acquire the ability to register external files, and one kind of such files could be changing files, so dpkg would take care of removing them, and would not store file hashes for these, etc.

Q: Why are substvars not effective (by default) on parts of debian/control?

A: Because the source stanza and some fields of of the binary stanzas are intended to be static, to guarantee deterministic source packages and builds.

There are multiple reasons for this:

  • Some of the substvars are not even available or known when doing a source-only build.
  • Other tools besides dpkg-dev scripts parse debian/control, and if substvars were used in several of those fields, it would mean more complex parsing in all those implementations and would require a way to coordinate and agree on the substvars used, even for the implicit ones.

  • To be able to know what's the end result for those fields, one would need to run code, or proof that debian/rules is not doing anything unexpected (without running it), which means analyzing potentially untrusted source packages is made more difficult.
  • The source information provided in the Sources indices is not useful anymore, or any consumer would need to expand those substvars too somehow, but that might be package specific logic.
  • It would require to install some parts of the build dependencies to be able to even build the source package with dpkg-source -b.

In Debian this is also prohibited anyway (even if the tools allowed it) by the ftp-masters (https://ftp-master.debian.org/REJECT-FAQ.html).

Relevant bugs: 5210 (and merged ones), 677474 (and merged ones).

Q: Why are .buildinfo files always generated with dpkg-buildpackage?

A: Because dpkg-buildpackage always performs a build, so recording the build information is relevant.

By default dpkg-buildpackage does active tasks such as cleaning via debian/rules, and makes sure that the dependencies from Build-Depends are satisfied as these are needed by the clean target. In addition the clean target can perform any kind of action that will affect the source package, which is also part of the build, and should be by default also reproducible. Also when doing source-only uploads, it might be desirable to make sure that binary build actually succeeds so doing something like dpkg-buildpackage --changes-option=-S will generate a source-only .changes for upload, but will build all the binary packages, which can then be checked or compared with a second build for unreproducible issues, for example.

If the intention is to just produce a source package instead of an actual build to upload, then using dpkg-source is always the better option.

(In the future dpkg-dev might grow a new dpkg-modchanges which could be used to filter out specific files for example.)

Q: What are version epochs and why and when are they needed?

A: An epoch helps handle changes in upstream versioning schemes.

Versions are assumed to always increase going forward, so to be able to reset the version to an earlier number we have epochs. It is a mechanism to be able to cope with the bogus realities in software releasing and distribution. Using epochs has the problem that they invalidate any existing versioned relationship within and beyond the distribution they are deployed into (including local admin changes, and derivatives), as they make those relationships meaningless. These relationships can appear in many places, which can be hard to check (i.e. dependency fields, shlib control files, maintainer scripts, native packages code, etc.).

The cases epochs were designed for:

  • Upstream completely changed its version scheme, and went backwards (e.g. from 403 to 1.0.1). (Arguably this is a bad practice and upstreams should not be doing this, but when this happens there's no other proper way around it. Extra care must be taken to make sure the new scheme will not end up reusing already released version numbers.)

  • The package maintainer messed up the version scheme (on a versionless upstream, by f.ex. using a date-based version, and not prefixing the version with "0." or "0~"), and then upstream started properly versioning their releases. (Arguably this should have been predicted at packaging time or rejected at archive acceptance time, but once done there's no other proper way around it.)

Some that might be acceptable, with conditions:

  • Upstream updated/corrected its version scheme, and went partially backwards (e.g. from 1.220 to 1.2.20). The version in the package might still be transformable (thus avoiding the need for an epoch) using the same previous scheme, as long as a new version does not make it impossible (such as with 1.20.1, which would then require an epoch). (Extra care must be taken to make sure the new scheme will not end up reusing already released version numbers.)

  • A source or binary package name is reused (after N distribution releases) to ship some other project with a lower version. (Arguably reusing the same package name to ship some other project is perhaps a questionable practice, but not unheard of.)
  • A source package merged the functionality of another source package and its matching binary package(s), with a higher version, which is intended to exist only for a transitional period. This case should be handled by either bumping the epoch only for the merged binary package(s), which will eventually disappear, or by pinning the higher version for the binary package(s) and appending the source version to it (e.g. merger-source 2.0, old-source 4.0, old-binary 4.0+2.0; merged-source 2.1, old-binary 4.0+2.1, etc.).

And some that are outright wrong:

  • A repackaged source tarball (the usual practice is to append something like +ds or +dfsg);
  • A temporary version revert, due to:
    • An accidental upload from the wrong suite, or during an ongoing transition, etc;
    • A detected severe issue after the fact (license, ABI breaks, catastrophic crashes, data loss, etc.);
    (These can be worked around with the 2really1 versioning scheme, which has the same devastating properties as an epoch, but it's at least temporary. Proper solutions to this problem, could be by implementing an archive section prepared to handle downgrades, with a higher package manager pin than the usual areas, or by adding a new field denoting the forced downgrade from a specific version only.)
  • To win the version race against a third-party repository. (This can never end well, as both parties can keep increasing the epoch endlessly, this is a version arms race.)
  • Merging downstream specific epoch bumps. (This means the upstream distribution is bound to the max of all downstream epoch numbers for any package; it'd be fair to say that if a downstream did an uncoordinated bump, they can also own it and keep that as a delta.)

An epoch is both confusing to users and distracting as it clutters the version string, and technically problematic, and it's a permanent stigma denoting that someone along the chain messed up. In the few situations for which it was designed for, it's the best we can do (a necessary evil), but for the other cases they should either never be used at all or at least without very strongly considering other alternatives!

Feature Request Questions

Q: Can we add new fields to .dsc files?

A: Adding new fields can be considered, as long as it is information that is source-related, and there is no convenient way to retrieve it otherwise (for example before unpacking the source).

One problem with the .dsc file is that it conflates the source package format and the .dsc file format. So in most cases adding new fields will affect all source formats.

New additions should be probably discussed at least in the debian-dpkg mailing list, and depending on the topic and type of change, possibly either on debian-devel and/or other specialized mailing lists (such as the autopkgtest-devel list, or similar).

Q: Can we add new fields to .changes files?

A: Adding new fields can be considered, as long as it is information that is upload-related, and there is no convenient way to retrieve it otherwise (for example before processing the upload).

New additions should be probably discussed at least in the debian-dpkg mailing list and with the Debian ftp-masters, and depending on the topic and type of change, possibly either on debian-devel and/or other mailing lists involving archive software or similar.

Q: Can we add support for new compressors for .dsc packages?

A: Adding more compression formats support to dpkg-source has some benefits as it avoids having to repack the original source tarball, and in this way checksums or upstream signatures are preserved, it might also improve compression ratio compared to previous supported compressions.

But then, the costs of adding support for an additional compression format are quite high, to the point that those benefits seem small compared to them. The costs are (at least): increase in the build-essential set; the new compression becomes part of the source format forever, as it can never be taken back as old Debian packages or 3rd-party packages in the wild might be using that format; it's another (uncommon) package required on non-Debian-based systems to be able to handle those source packages.

With the above in mind, the new compression format should really be very widely used to distribute upstream source and it should improve compression ratio compared to previously supported formats to start considering adding it.

Q: Can we add support for new compressors for .deb packages?

A: Adding more compression formats support to dpkg-deb can be considered, as long as it provides significant advantages to previously supported compression formats.

But then, the costs of adding support for an additional compression format are quite high, to the point that those benefits seem small compared to them. The costs are (at least): increase in the pseudo-essential set; the new compression becomes part of the binary format forever, as it can never be taken back as old Debian binary packages or 3rd-party packages in the wild might be using that format; it's another (uncommon) package required on non-Debian-based systems to be able to handle those binary packages; and all binary package parsers need to be updated to support the new format (see Teams/Dpkg/DebSupport).

With the above in mind, the new compression format should have portable implementations, should be somewhat common, and it should improve compression ratio compared to previously supported formats to start considering adding it.

Q: Can we add support for new default environment variables set by dpkg-buildpackage?

A: Depends. No, for any variable required by debian/rules; yes, for anything that the user would need to specify manually anyway.

The official entry point to build a source package is still the debian/rules file, and as long as the Debian project does not change its stance on that point, any environment variable required to build the binary packages must be defined by debian/rules, otherwise the build will fail when not using dpkg-buildpackage. Examples of this could be the current locale (see 873919, 843776), or the build flags (which used to be defined but got reverted due to the increased breakage due to the assumptions source packages were starting to do).

Any variable that would need to be set by the user anyway to change the behaviour of the build, can be set by dpkg-buildpackage as a matter of convenience. Examples of this could be the environment needed for a cross-compilation.

The other option is to set any variable via the dpkg Makefile fragments, which are intended to be included by debian/rules, obviously this will have as much coverage as the amount of packages including those fragments.

Q: Can we add support for new default build flags to dpkg-buildflags?

A: Certainly. But before considering adding any flag to the default, the following things would need to be considered:

  • The flag should not involve any warning about style issues (those are subjective and dependent on upstream coding standards).
  • A test rebuild of the whole archive w/ and w/o the flags, and a comparison to see how much difference is there in the amount built.
  • For non-warning flags, a comparison of the build logs to see the memory and build time difference if these might seem relevant (sbuild should provide those).
  • For flags that change run-time semantics, ideally an additional run of the autopkgtest for packages that ship them (although this cannot be deemed conclusive as our coverage is not great yet).
  • Once these are done, and if it still seems worth it, a discussion started in debian-devel proposing the change, asking if there's other known issues, concerns, etc.

Q. Can we add support for new dpkg architectures?

A: Sure. These are the current requirements for a new dpkg architecture:

  • It should have an official GNU triplet in the GNU config project.

  • It should have support merged upstream in at least GNU binutils, gcc, and the respective libc project used.

  • It should not require the machine manufacturer (or vendor) part of the GNU triplet to be used to distinguish the ABI, nor it should expose it as unknown in the GNU triplet names nor internally in the Debian tuple.

  • It should not have the same (full) ABI as any existing dpkg architecture.
  • The mapping between a GNU triplet and a dpkg architecture should be 1:1 (i.e. bijective).
  • The dpkg architecture name should have been vetted by the current architecture porting team and/or the dpkg maintainers.
  • The dpkg architecture name should try to use a pattern similar to an already existing and related architecture (although there's existing exceptions to this due to historical reasons, but those should not be used as precedent!).
    • The bits size gets appended to the base architecture cpu name (for example sparc and sparc64).
    • The endianness gets appended to the base architecture cpu name. The suffixes are eb for big-endian and el for little-endian. If the architecture is not capable of operating with either endianness, then there's no need to suffix it. If one of the endianness is the prevalent one and the other sees marginal use, then the prevalent one can obviate the endianness suffix.

  • The dpkg architecture name has the following additional characteristics:
    • The name is a tuple with the components in reverse order compared to the GNU triplet, <abi>-<libc>-<kernel>-<cpu>.

    • If the tuple only has one component the <kernel> is assumed to be linux.

    • If the tuple is kernel independent then <kernel> should be none.

    • If the tuple only has one or two components the <libc> is assumed to be the baseline for that port, so on a glibc-centric port uncommon variations should use the three or four-form triplet (e.g. uclibc or musl variants).

    • If the tuple only has three components the <abi> is assumed to be the baseline for that port, which will be assumed to be base.

    • If the <abi> is different than base, then it will normally be the merged ABI part of the GNU triplet as <cpu>-<kernel>-<libc><abi> in something like arm-linux-gnueabi with gnu and eabi respectively.