Debian Policy Manual: 4.13. Embedded code copies
Some software packages include in their release distributions "convenience" copies of code from other software packages, generally so that users compiling from source don’t have to download multiple archives. Debian packages should not make use of these copies unless the included package is explicitly intended to be used in this way. If the included code is already in the Debian archive in the form of a library, the Debian packaging should ensure that binary packages reference the libraries already in Debian and not the embedded copy. If the included code is not already in Debian, it should be packaged separately as a prerequisite dependency, if possible.
Embedded Copies
Debian discourages embedded copies (vendoring) where possible
It is recommended that Debian packages do not ship embedded copies of code, data, fonts or other things. Instead, package dependencies should be kept separate, and dependencies used to ensure the needed items are installed.
Shipping embedded copies (also known as 'vendoring') is discouraged because:
- It makes it more likely that users are exposed to known security issues. Debian considers it is better to have a single place to make security and other fixes. Fixing issues in vendored items requires more manual work, and embedded copies may be missed.
- Embedded items tend to be older versions that are no longer supported by the author of the embedded item. This leads to unfixed bugs.
- Multiple copies of the embedding items are needless duplication on user systems, and in the debian archive.
In practice, some upstreams explicitly design their software to vendor huge numbers of packages and removing the vendoring is impractical. Vendoring is reluctantly tolerated for non-libraries in some circumstances: see Debian Policy Manual: 4.13. Embedded code copies.
Packaging with embedded copies
When packaging software that has embedded copies, you should ask upstream to consider removing them from the upstream VCS and source tarballs. If upstream removes the embedded items, the Debian package can then be updated to the fixed version. Alternatives for upstream include:
- using dependencies. many ecosystems have package managers that support dependencies;
- only embedding the copy in the binary tarballs they distribute;
- scripting the install dependencies; or
- bundling dependencies into a single, separate, tarball instead of embedding.
If upstream refuse to remove the embedded copies, then Debian should either:
repack the upstream tarball using Files-Excluded. This is particularly appropriate if there is a DFSG or size issue.
- remove the files when building the package.
This can be done in the debian/rules' clean target, or early in the build target, to ensure the copy is not used in the build process.
Tracking embedded copies
The list of packages that embed copies (including unused ones) of other projects is maintained in the security-tracker git repository. This list also contains information about forks so that the security team can check if all forks contain the same vulnerabilities.
All Debian members have commit access to the security-tracker repository and others can send suggestions or additions to the debian-security-tracker mailing list.
Tools
Lintian
Lintian detects embedding of
- Common libraries written in
PostScript: copyrighted Adobe font fragments (without credit)
Others
check-all-the-things has a couple of tests (embed-readme, embed-dirs) for finding embedded copies via heuristics, and several ideas for new tests.
These Gobby pages mention embedded copies: Embedded modules in inc (by the Debian Perl Team)
The Debian duplication detector detects duplicate files in binary packages and may be useful for detecting verbatim duplication of files across multiple binary packages.
Clonewise is a tool not yet in Debian that could be used to find unfixed vulnerabilities because of embedded code copies.
SourcererCC is another tool for detecting embedded code copies.
Sokrates can also do duplication detection.
JPlag finds pairwise similarities among a set of multiple programs
The Debian Sources service allows searching for specific hashes and ctags throughout all Debian source code, which may be useful for detecting duplication of source code and data.
If you have a particular file with some interesting aspect (security issue, etc.), you can likely find other copies using Debian Code Search or similar external service, such as Black Duck Open Hub, SourceGraph Public Code Search or GitHub Search.
If a file has a fairly unique name, you can often find copies of that file by searching the contents of Debian binary or source packages using apt-file:
apt-file search uniquename.py |
or |
apt-file search -I dsc uniquename.c |
Tracking
Various Debian folks keep track of embedded copies they found via usertags:
rbrito@ime.usp.br jwilk@debian.org mbehrle@debian.org pabs@debian.org sramacher@debian.org dr@jones.dk
See also
These wiki pages mention embedded copies: arc4random
External links
Fedora Packaging Guidelines: Bundling policy
Fedora Wiki: Bundled libraries
Gentoo Wiki: Bundled dependencies policy
Homebrew Documentation: Acceptable Formulae § Vendored dependencies policy
