The debian duplication detector is a service that scans binary Debian packages and records hashes of regular files contained. It can then discover files shipped in multiple packages or multiple times in one package, that can possibly replaced by links to save space. Another use case is to discover embedded copies in scripting languages. Note that a similar service called clonewise is being worked on which looks at source packages, that might be better suited for discovering embedded copies.
Q: The PTS says that my package foo shares data with itself. How can that be?
A: This can happen when your package ships multiple copies of files. Those files actually consume the space on the disk and the mirrors multiple times. Both hard links and soft links are properly detected and not reported as duplication. See section "Within a single binary package".
Q: Why is there a sharing notice in the todo section of my package at all? I cannot do anything about it.
A: The current heuristic is to list packages that have at least 1MB and at least 10% of their installed size of sharing. Being a heuristic means that it can be wrong. To get the notice removed for your particular package, report a bug (see below).
Tips for reducing duplication in packages
Within a single binary package
If the software accessing the duplicate files supports symlinks, add the following Build-Depends in debian/control
Build-Depends:... + rdfind, + symlinks
then you can run the following commands from debian/rules after the files are installed by make install or similar.
# Replace duplicate files with symlinks rdfind -outputname /dev/null -makesymlinks true debian/mypackage/ # Fix those symlinks to make them relative symlinks -r -s -c debian/mypackage/
An example package using this technique is megaglest.
Within multiple binary packages from a single source package
If the duplicated files are significant, you might want to pool them in a foo-common package and have the other binary packages depend on that. If there is one particular package required by all other packages, consider using dh_installdocs --link-doc=foo-common.
Within multiple binary packages from multiple source packages
You should co-ordinate with the maintainers of the source packages and come up with a solution.
Where the files are from embedded copies of other projects, the other projects should be packaged separately and the packages containing them should drop the files and depend on the new packages.
The dh-linktree helper can assist with replacing embedded copies by symbolic links to files in other packages.
Slides for DebConf
Bugs and known issues
If you discover a bug or want a new feature, email firstname.lastname@example.org.
A known limitation is that shared files are reported for different versions of the same software in the PTS. At the moment wesnoth and python are filtered via regular expressions. If more are needed, report a bug.