Unifying the process to strip files with problematic copyright from upstream tarballs

Deleting Files using Files-Excluded field in debian/copyright

You can exclude files from a repackaged source tarball by adding a Files-Excluded: pattern field in the debian/copyright format (see also the proposition to update the specification in 685506). The pattern is searched using find -path.

As suggested in a discussion on devscripts-devel mailing list a more flexible implementation than in 685506 is proposed:

So the expression

would exclude

Example debian/copyright file:

Format: http://www.debian.org/doc/packaging-manuals/copyright-format/1.0/
Upstream-Name: Spread
Source: https://github.com/phylogeography/SPREAD/downloads
Files-Excluded:
    *.jar
    release/Mac
    release/Windows
    release/tools
    bin
    classes
    .git

(Please also read paragraph below Once we are start removing files below)

Once we are start removing files

The current implementation in the git repository from Andreas Tille does a bit more once repackaging becomes necessary:

Removing VCS cruft from tarball

When repackaging tar --exclude-vcs is used. Usually there is no point in having VCS metainformation in upstream tarballs. It should be depated in a separate thread whether this option should be used unconditionally but the current implementation is that way. So in the example above the specification of .git is redundant because it will be left out anyway.

Specifying better compression method

Rafael Laboissiere extracted a patch that was merged into the private repository of Andreas Tille and submitted a separate bug 730768 to deal with this enhancement.

You can specify a more reasonable compression method using uscan --repack-compression <compression>. You can use xz, bz2, gz, or lzma here. Current default is gz - the author is tempted to turn default to xz.

Considerations about debian/copyright pattern specification

The thread showed that understanding the format is quite difficult. Next revision should explicitely mention that a pattern ending with / or beginning with ./ will never match anything (688481).

Information for developers

TODO

Brackets in debian/copyright patterns should be escaped before being passed to find, as they are metacharacters for find but not in debian/copyright. Also, some shell metacharacters should be escaped (consider the "$(evil_command)" pattern). The actual unlink/rmdir actions should be echoed depending on command line options/environment/debug level. All this should be checked once both implementation have been merged.