Differences between revisions 1 and 52 (spanning 51 versions)
Revision 1 as of 2012-09-22 20:25:44
Size: 6092
Comment:
Revision 52 as of 2014-12-23 04:04:51
Size: 1703
Comment: Link to the specification.
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
This page attempts to summarize [[https://lists.debian.org/debian-devel/2012/08/msg00380.html|a long debian-devel thread]] and allow more structured discussion about uscan automatically handling some common repackaging situations: removing some files or changing the compression scheme. = Unifying the process to strip files with problematic copyright from upstream tarballs =
Line 3: Line 3:
= Existing solutions = The following features of `uscan` since devscripts version 2.14.2 are helpful to deal with upstream tarballs you need to tweak.
Line 5: Line 5:
Many packages call a script from the get-orig-source debian/rules target.
Some other maintainers let uscan call it from debian/watch.
== Deleting Files using Files-Excluded field in debian/copyright ==
Line 8: Line 7:
Daniel Leidert calls [[http://anonscm.debian.org/viewvc/debichem/unstable/openbabel/debian/get-orig-source.sh?view=markup|a script]] from [[http://anonscm.debian.org/viewvc/debichem/unstable/openbabel/debian/watch?view=markup|debian/watch]].

Mike Hommey also calls [[http://anonscm.debian.org/gitweb/?p=pkg-mozilla/iceweasel.git;a=blob;f=debian/repack.py;h=a797d5471f20e0f8de155d483e5ad2f1b2c3bdc5;hb=c1ebf8be93add288837377e4fdd87f9c9f1082cc|a script]] from [[http://anonscm.debian.org/gitweb/?p=pkg-mozilla/iceweasel.git;a=blob;f=debian/watch;h=a81586cc891ab608c52717069c99703227e4d077;hb=c1ebf8be93add288837377e4fdd87f9c9f1082cc|debian/watch]] that allows to filter at the same time as the file is downloaded, without actually extracting to disk.
The set of files to remove is passed through a separate file, supporting wildcards, and extra filters (sed-like).
This can seem a worthless optimization, but for huge source tarballs (say 80MB bzipped) and slow download links, the whole process is about as
fast as downloading alone.
See [[http://anonscm.debian.org/gitweb/?p=pkg-mozilla/iceweasel.git;a=blob;f=debian/source.filter;h=ec7efac7b97add1f39480c07fecb4b70ae7a7ec8;hb=c1ebf8be93add288837377e4fdd87f9c9f1082cc|this example]].

Another variant is used by [[http://anonscm.debian.org/gitweb/?p=pkg-perl/scripts.git;a=tree|pkg-perl]], documented [[http://pkg-perl.alioth.debian.org/howto/repacking.html|here]].

There is also the `--filter-pristine-tar` option to `git-import-orig`. See this [[http://anonscm.debian.org/gitweb/?p=pkg-ocaml-maint/packages/why.git;a=blob;f=debian/gbp.conf;h=4435dcbe6d877cec7f562e8757939b7e98ecf5d8;hb=HEAD|gbp.conf example]]. Git-import-orig may later be modified to ignore files excluded by debian/control as uscan does. It already handles changing the compression scheme.

== How to trigger the repackaging ==

The `--repack` option should accept an argument specifying that the tarball archive should be repackaged with another compression method, even in case no file has to be removed.

If debian/copyright explicitely mention patterns designating files to exclude, `uscan --repack` must repack the tarball.
The version string will then be added `+dfsg` to express the fact that the content of the original source was changed. This suffix should be configurable, in case upstream re-releases the same upstream version repackaged to fix a purely tarball-related issue.
To prevent uscan from doing so, the `--no-exclusion` command-line option and the `USCAN_NO_EXCLUSION` variable may be set in /etc/devscripts.conf
or ~/.devscripts.

Ideally, the deletion could be executed from outside uscan too, in case the upstream tarball is generated from a VCS repository and uscan is never called.
This will only be useful until uscan understand all VCS kinds in the world.

== Deleted files specification ==

This point has caused many discussions, see [[http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=561494|#561494]] in addition to the thread above.
It seems that a consensus is reached (TM) to group into a single place information about where files are copyed from, why they are not or where they are allowed to be redistributed.

The latest [[http://www.debian.org/doc/packaging-manuals/copyright-format/1.0/|debian/copyright format]] allows defining sets of files sharing the same license by successive exclusions. Existing parsers and glob syntax may be reused if a fake license is defined, meaning that the maintainer wants some files out of the Debian tarball.

Here is an example debian/copyright. Excluded pattern are separated to demonstrate per-file-set comments. In real life, "Text of GPL3+" would be in a separate paragraph.
By using the field {{{Files-Excluded}}}:
Line 41: Line 9:
Files: *
License: GPL3+
 Full license text.

Files: __MACOSX
License: not-shipped-by-debian
 Optionaly explain here why __MACOSX is rejected.

Files: *.jar
License: not-shipped-by-debian
 Optionaly explain here why jar files are rejected.

Files: rdp_classifier_2.5/lib/ReadSeq.jar
License: GPL3+
 Full license text.
Files-Excluded: foo/bar.js
Line 58: Line 12:
== debian/copyright pattern specification == in the header paragraph of `debian/copyright`, you can exclude files from an upstream tarball (see `man uscan`).
Line 60: Line 14:
Lintian could produce a warning if a file designated for removal in `debian/copyright` still exists in the source package. This field accepts a whitespace-separated list of patterns like [[https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/#files-field|Files]]. Example:
{{{
Files-Excluded: */Makefile.in aclocal.m4 config.h.in configure
}}}
Currently, this feature is not yet documented in `debian/copyright` file format [[https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/|specification]], but there is an open bug about this (DebianBug:685506).
Line 62: Line 20:
The list of accepted license abbreviations in
[[http://www.debian.org/doc/packaging-manuals/copyright-format/1.0/|the 1.0 copyright format]] should be updated.
== Specifying better compression method ==
Line 65: Line 22:
The thread showed that understanding the format is quite difficult.
Next revision should explicitely mention that a pattern ending with `/` or beginning with `./` will never match anything.
More generally, tools should produce a warning when one of the patterns does not match any of the files remaining after execution of the previous patterns.
This would be useful for all licenses, not only for stanzas removing files.
You can specify a more reasonable compression method using `uscan --repack --compression <compression>`. You can use `xz`, `bz2`, `gz`, or `lzma` here. (see `man uscan`)
Line 70: Line 24:
== Implementation ==
Line 72: Line 25:
Current implementation attempt is maintained by Andreas Tille in [[git://git.debian.org/git/users/tille/devscripts.git|this git repository]]. == Not yet implemented but potentially helpful ==
Line 74: Line 27:
TODO: brackets in `debian/copyright` patterns should be escaped before being passed to find.

[[https://lists.debian.org/debian-devel/2012/08/msg00425.html|Debian::Copyright]] packaged as libdebian-copyright-perl.
[[https://lists.debian.org/debian-devel/2012/08/msg00512.html|Parse::DebControl]] packaged as ibparse-debcontrol-perl, used in devscripts.

[[https://lists.debian.org/debian-devel/2012/08/msg00507.html|Dpkg::Control::Hash]] packaged as libdpkg-perl, used in devscripts.

The first seem too strict about non-standard fields.
The two latter ones seem so similar that the eventual choice may be deffered to the uscan maintainer.
When repacking, `tar --exclude-vcs` is used. Usually there is no point in having VCS metainformation in upstream tarballs. It should be debated in a separate thread whether this option should be used unconditionally, but the current implementation is that way. So, in the example above, the specification of `.git` is redundant because it will be left out anyway. The feature to exclude VCS information from upstream tarballs will be subject of a future bug report.

Unifying the process to strip files with problematic copyright from upstream tarballs

The following features of uscan since devscripts version 2.14.2 are helpful to deal with upstream tarballs you need to tweak.

Deleting Files using Files-Excluded field in debian/copyright

By using the field Files-Excluded:

Files-Excluded: foo/bar.js

in the header paragraph of debian/copyright, you can exclude files from an upstream tarball (see man uscan).

This field accepts a whitespace-separated list of patterns like Files. Example:

Files-Excluded: */Makefile.in aclocal.m4 config.h.in configure

Currently, this feature is not yet documented in debian/copyright file format specification, but there is an open bug about this (685506).

Specifying better compression method

You can specify a more reasonable compression method using uscan --repack --compression <compression>. You can use xz, bz2, gz, or lzma here. (see man uscan)

Not yet implemented but potentially helpful

When repacking, tar --exclude-vcs is used. Usually there is no point in having VCS metainformation in upstream tarballs. It should be debated in a separate thread whether this option should be used unconditionally, but the current implementation is that way. So, in the example above, the specification of .git is redundant because it will be left out anyway. The feature to exclude VCS information from upstream tarballs will be subject of a future bug report.