Unifying the process to strip files with problematic copyright from upstream tarballs
Deleting Files using Files-Excluded field in debian/copyright
You can exclude files from a repackaged source tarball by adding a Files-Excluded: pattern field in the debian/copyright format (see also the proposition to update the specification in 685506). The pattern is searched using find -path.
Example debian/copyright file:
Format: http://www.debian.org/doc/packaging-manuals/copyright-format/1.0/ Upstream-Name: Spread Source: https://github.com/phylogeography/SPREAD/downloads Files-Excluded: *.jar release/Mac release/Windows release/tools bin classes .git
(Please also read paragraph below Once we are start removing files below)
Once we are start removing files
The current implementation in the git repository from Andreas Tille does a bit more once repackaging becomes necessary:
Removing VCS cruft from tarball
When repackaging tar --exclude-vcs is used. Usually there is no point in having VCS metainformation in upstream tarballs. It should be depated in a separate thread whether this option should be used unconditionally but the current implementation is that way. So in the example above the specification of .git is redundant because it will be left out anyway.
Specifying better compression method
Rafael Laboissiere extracted a patch that was merged into the private repository of Andreas Tille and submitted a separate bug 730768 to deal with this enhancement.
You can specify a more reasonable compression method using uscan --repack-compression <compression>. You can use xz, bz2, gz, or lzma here. Current default is gz - the author is tempted to turn default to xz.
Considerations about debian/copyright pattern specification
The thread showed that understanding the format is quite difficult. Next revision should explicitely mention that a pattern ending with / or beginning with ./ will never match anything (688481).
Information for developers
Brackets in debian/copyright patterns should be escaped before being passed to find, as they are metacharacters for find but not in debian/copyright. Also, some shell metacharacters should be escaped (consider the "$(evil_command)" pattern). The actual unlink/rmdir actions should be echoed depending on command line options/environment/debug level. All this should be checked once both implementation have been merged.