This page attempts to summarize a long debian-devel thread and allow more structured discussion about uscan automatically handling some common repackaging situations: removing some files or changing the compression scheme.

Current implementation is maintained by Andreas Tille in this git repository.

Historically existing hacks to deal with the problem to some extend

Many packages call a script from the get-orig-source debian/rules target. Some other maintainers let uscan call it from debian/watch.

Daniel Leidert calls a script from debian/watch.

Mike Hommey also calls a script from debian/watch that allows to filter at the same time as the file is downloaded, without actually extracting to disk. The set of files to remove is passed through a separate file, supporting wildcards, and extra filters (sed-like). This can seem a worthless optimization, but for huge source tarballs (say 80MB bzipped) and slow download links, the whole process is about as fast as downloading alone. See this example.

Another variant is used by pkg-perl, documented here.

There is also the --filter-pristine-tar option to git-import-orig. See this gbp.conf example. Git-import-orig may later be modified to ignore files excluded by debian/control as uscan does. It already handles changing the compression scheme.

How to trigger the repackaging

The --repack option should accept an argument specifying that the tarball archive should be repackaged with another compression method, even in case no file has to be removed.

If files are actually excluded, uscan --repack must repack the tarball. The version string will then be added +dfsg to express the fact that the content of the original source was changed. This suffix should be configurable, in case upstream re-releases the same upstream version repackaged to fix a purely tarball-related issue. To prevent uscan from doing so, the --no-exclusion command-line option and the USCAN_NO_EXCLUSION variable may be set in /etc/devscripts.conf or ~/.devscripts.

Ideally, the deletion could be executed from outside uscan too, in case the upstream tarball is generated from a VCS repository and uscan is never called. This will only be useful until uscan understand all VCS kinds in the world.

Deleted files specification

This point has caused many discussions, see #561494 in addition to the thread above. It seems that a consensus is reached (TM) to group into a single place information about where files are copyed from, why they are not or where they are allowed to be redistributed. debian/copyright seems a natural candidate, even if its name suggests something less general.

It may be useful to let Lintian produce a warning when a file designated for removal still exists in the source package.

The current implementation relies on a new Files-Excluded: pattern field in the debian/copyright format. The pattern is searched in the top directory with find -name if it contains no slash, with find -path if it contains one, then all matching files or directories are removed from the repackaged tarball.

Another solution may be considered, as in this experimental implementation. The latest debian/copyright format allows defining sets of files sharing the same license by successive exclusions. Existing parsers and glob syntax may be reused if a fake license is defined, meaning that the maintainer wants some files out of the Debian tarball. The list of accepted license abbreviations in the 1.0 copyright format should be updated. Here is an example debian/copyright. Excluded pattern are separated to demonstrate per-file-set comments. In real life, "Text of GPL3+" would be in a separate paragraph.

Files: *
License: GPL3+
 Full license text.

Files: __MACOSX */__MACOSX
License: not-shipped-by-debian
 Optionaly explain here why __MACOSX are rejected.

Files: *.jar
License: not-shipped-by-debian
 Optionaly explain here why jar files are rejected.

Files: rdp_classifier_2.5/lib/ReadSeq.jar
License: GPL3+
 Full license text.

Considerations about debian/copyright pattern specification

The thread showed that understanding the format is quite difficult. Next revision should explicitely mention that a pattern ending with / or beginning with ./ will never match anything (688481).

Information for developers

TODO: brackets in debian/copyright patterns should be escaped before being passed to find.

Debian::Copyright packaged as libdebian-copyright-perl. Parse::DebControl packaged as libparse-debcontrol-perl, used in devscripts.

Dpkg::Control::Hash packaged as libdpkg-perl, used in devscripts.

The first seem too strict about non-standard fields. The two latter ones seem so similar that the eventual choice may be deffered to the uscan maintainer.