The exlcusion of files was imlpemented in devscripts 2.13.5
Drop now outdated text
|Deletions are marked like this.||Additions are marked like this.|
|Line 2:||Line 2:|
== Historically existing hacks to deal with the problem to some extent ==
Many packages call a script from the get-orig-source debian/rules target.
Some other maintainers let uscan call it from debian/watch.
Daniel Leidert calls [[http://anonscm.debian.org/viewvc/debichem/unstable/openbabel/debian/get-orig-source.sh?view=markup|a script]] from [[http://anonscm.debian.org/viewvc/debichem/unstable/openbabel/debian/watch?view=markup|debian/watch]].
Mike Hommey also calls [[http://anonscm.debian.org/gitweb/?p=pkg-mozilla/iceweasel.git;a=blob;f=debian/repack.py;h=a797d5471f20e0f8de155d483e5ad2f1b2c3bdc5;hb=c1ebf8be93add288837377e4fdd87f9c9f1082cc|a script]] from [[http://anonscm.debian.org/gitweb/?p=pkg-mozilla/iceweasel.git;a=blob;f=debian/watch;h=a81586cc891ab608c52717069c99703227e4d077;hb=c1ebf8be93add288837377e4fdd87f9c9f1082cc|debian/watch]] that allows to filter at the same time as the file is downloaded, without actually extracting to disk.
The set of files to remove is passed through a separate file, supporting wildcards, and extra filters (sed-like).
This can seem a worthless optimization, but for huge source tarballs (say 80MB bzipped) and slow download links, the whole process is about as
fast as downloading alone.
See [[http://anonscm.debian.org/gitweb/?p=pkg-mozilla/iceweasel.git;a=blob;f=debian/source.filter;h=ec7efac7b97add1f39480c07fecb4b70ae7a7ec8;hb=c1ebf8be93add288837377e4fdd87f9c9f1082cc|this example]].
Another variant is used by [[http://anonscm.debian.org/gitweb/?p=pkg-perl/scripts.git;a=tree|pkg-perl]], documented [[http://pkg-perl.alioth.debian.org/howto/repacking.html|here]].
There is also the `--filter-pristine-tar` option to `git-import-orig`. See this [[http://anonscm.debian.org/gitweb/?p=pkg-ocaml-maint/packages/why.git;a=blob;f=debian/gbp.conf;h=4435dcbe6d877cec7f562e8757939b7e98ecf5d8;hb=HEAD|gbp.conf example]]. Git-import-orig may later be modified to ignore files excluded by debian/control as uscan does. It already handles changing the compression scheme.
== Proposed triggering of repackaging a tarball ==
The idea is to add information to the source package that triggers a repackaging process automatically. The point where to specify the removals has caused many discussions, see DebianBug:561494 and later in
[[https://lists.debian.org/debian-devel/2012/08/msg00380.html|a long debian-devel thread]] that finally leaded to creating this Wiki page.
It seems that a consensus is reached (TM) to group into a single place information about where files are copyed from, why they are not or where they are allowed to be redistributed.
`debian/copyright` seems a natural candidate, even if its name suggests something less general. Both of the following implementation suggestions are based on this consensus.
In case of unpacking the version string will then be added `+dfsg` to express the fact that the content of the original source was changed. This suffix should be configurable, in case upstream re-releases the same upstream version repackaged to fix a purely tarball-related issue.
To prevent uscan from automatic repackaging, the `--no-exclusion` command-line option and the `USCAN_NO_EXCLUSION` variable may be set in /etc/devscripts.conf
Ideally, the deletion could be executed from outside uscan too, in case the upstream tarball is generated from a VCS repository and uscan is never called.
This will only be useful until uscan understand all VCS kinds in the world.
It may be useful to let Lintian produce a warning when a file designated for removal still exists in the source package.
Unifying the process to strip files with problematic copyright from upstream tarballs
Deleting Files using Files-Excluded field in debian/copyright
You can exclude files from a repackaged source tarball by adding a Files-Excluded: pattern field in the debian/copyright format (see also the proposition to update the specification in 685506). The pattern is searched using find -path.
Example debian/copyright file:
Format: http://www.debian.org/doc/packaging-manuals/copyright-format/1.0/ Upstream-Name: Spread Source: https://github.com/phylogeography/SPREAD/downloads Files-Excluded: *.jar release/Mac release/Windows release/tools bin classes .git
(Please also read paragraph below Once we are start removing files below)
Once we are start removing files
The current implementation in the git repository from Andreas Tille does a bit more once repackaging becomes necessary:
Removing VCS cruft from tarball
When repackaging tar --exclude-vcs is used. Usually there is no point in having VCS metainformation in upstream tarballs. It should be depated in a separate thread whether this option should be used unconditionally but the current implementation is that way. So in the example above the specification of .git is redundant because it will be left out anyway.
Specifying better compression method
Rafael Laboissiere extracted a patch that was merged into the private repository of Andreas Tille and submitted a separate bug 730768 to deal with this enhancement.
You can specify a more reasonable compression method using uscan --repack-compression <compression>. You can use xz, bz2, gz, or lzma here. Current default is gz - the author is tempted to turn default to xz.
Considerations about debian/copyright pattern specification
The thread showed that understanding the format is quite difficult. Next revision should explicitely mention that a pattern ending with / or beginning with ./ will never match anything (688481).
Information for developers
Brackets in debian/copyright patterns should be escaped before being passed to find, as they are metacharacters for find but not in debian/copyright. Also, some shell metacharacters should be escaped (consider the "$(evil_command)" pattern). The actual unlink/rmdir actions should be echoed depending on command line options/environment/debug level. All this should be checked once both implementation have been merged.