Differences between revisions 39 and 57 (spanning 18 versions)
Revision 39 as of 2017-10-26 12:57:08
Size: 2997
Editor: ?jbicha
Comment: fix link to Debian Policy
Revision 57 as of 2022-05-07 01:48:40
Size: 5223
Editor: PaulWise
Comment: another duplication detector
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
[[https://www.debian.org/doc/debian-policy/#convenience-copies-of-code|Debian Policy 4.13]] states that Debian packages should not use convenience copies. ## page was renamed from EmbeddedCodeCopies
[[https://www.debian.org/doc/debian-policy/ch-source.html#embedded-code-copies|Debian Policy 4.13]] states that Debian packages should not use convenience copies.
Line 3: Line 4:
The list of packages embedding code from other projects is maintained in the secure-testing svn repository. Embedded copies (of code, data, fonts or other things) should be removed from the upstream VCS and source tarballs. Upstream might want to only embed the copies in the binary packages they distribute, script the install of their dependencies and or bundle the dependencies into a single but separate source tarball rather than embedding copies of them. Once upstream has fixed the issue, the Debian package can then be updated to the fixed version. If upstream refuse to remove the embedded copies, then Debian should either repack the upstream tarball using Files-Excluded (if there is a DFSG or size issue) or remove the files in `debian/rules clean` and very early in `debian/rules build`, so that there is no chance of them being used by the build process.
Line 5: Line 6:
https://anonscm.debian.org/viewvc/secure-testing/data/embedded-code-copies?view=co The list of packages that embed copies (including unused ones) of other projects is maintained in the security-tracker git repository.
Line 7: Line 8:
This list also contains information about code forks so that the security team can check if all forks contain the same vulnerabilities. https://salsa.debian.org/security-tracker-team/security-tracker/raw/master/data/embedded-code-copies
Line 9: Line 10:
All Debian members have commit access to the secure-testing repository and others can send suggestions or additions to the [[DebianList:debian-security-tracker|debian-security-tracker mailing list]]. This list also contains information about forks so that the security team can check if all forks contain the same vulnerabilities.
Line 11: Line 12:
Lintian detects embedding of [[https://lintian.debian.org/tags/embedded-feedparser-library.html|feedparser]], common [[https://lintian.debian.org/tags/embedded-javascript-library.html|JavaScript]]/[[https://lintian.debian.org/tags/embedded-library.html|C/C++]]/[[https://lintian.debian.org/tags/embedded-pear-module.html|PEAR]]/[[https://lintian.debian.org/tags/embedded-php-library.html|PHP]] libraries and !PostScript fragments ([[https://lintian.debian.org/tags/license-problem-font-adobe-copyrighted-fragment.html|1]] [[https://lintian.debian.org/tags/license-problem-font-adobe-copyrighted-fragment-no-credit.html|2]]). All Debian members have commit access to the security-tracker repository and others can send suggestions or additions to the [[DebianList:debian-security-tracker|debian-security-tracker mailing list]].
Line 13: Line 14:
These wiki pages mention embedded code copies: [[arc4random]] Lintian detects embedding of [[https://lintian.debian.org/tags/embedded-feedparser-library.html|feedparser]], common [[https://lintian.debian.org/tags/embedded-javascript-library.html|JavaScript]]/[[https://lintian.debian.org/tags/embedded-library.html|C/C++]]/[[https://lintian.debian.org/tags/embedded-pear-module.html|PEAR]]/[[https://lintian.debian.org/tags/embedded-php-library.html|PHP]] libraries, !PostScript fragments ([[https://lintian.debian.org/tags/license-problem-font-adobe-copyrighted-fragment.html|1]] [[https://lintian.debian.org/tags/license-problem-font-adobe-copyrighted-fragment-no-credit.html|2]]) and [[https://lintian.debian.org/tags/duplicate-font-file.html|fonts]].
Line 15: Line 16:
The [[dedup.debian.net|Debian duplication detector]] detects duplicate files in binary packages and may be useful for detecting verbatim duplication of interpreted code and data. [[https://github.com/collab-qa/check-all-the-things/|check-all-the-things]] has a couple of tests (embed-readme, embed-dirs) for finding embedded copies via heuristics, and several ideas for new tests.
Line 17: Line 18:
[[https://github.com/silviocesare/Clonewise|Clonewise]] is a tool not yet in Debian that [[https://lists.debian.org/debian-security/2012/07/msg00000.html|could be used to find unfixed vulnerabilities because of embedded code copies]]. These wiki pages mention embedded copies: [[arc4random]]
Line 19: Line 20:
If you have a particular piece of code with some interesting aspect (security issue etc) you can likely find other copies using the [[DebianCodeSearch|Debian code search site]] or external code search engines such as [[https://code.ohloh.net/|Ohloh code]], [[https://searchcode.com/|searchcode]] and [[https://github.com/|GitHub]]. These gobby pages mention embedded copies: [[https://gobby.debian.org/export/Teams/Perl/Embedded_modules_in_inc|Teams/Perl/Embedded_modules_in_inc]].
Line 21: Line 22:
Various Debian folks keep track of embedded code copies they found via usertags: The [[dedup.debian.net|Debian duplication detector]] detects duplicate files in binary packages and may be useful for detecting verbatim duplication of files across multiple binary packages.

[[https://github.com/silviocesare/Clonewise|Clonewise]] is a tool not yet in Debian that [[https://lists.debian.org/msgid-search/CA+ygN1LxTeSFSt45qDC2KLKbYUWTqPvrm5ZHvEjjoEkuDL4f5g@mail.gmail.com/firsthit|could be used to find unfixed vulnerabilities because of embedded code copies]]. [[https://github.com/Mondego/SourcererCC|SourcererCC]] is another tool for detecting embedded code copies. [[https://www.sokrates.dev/|Sokrates]] can also do [[https://www.sokrates.dev/book/duplication|duplication detection]].

The [[https://sources.debian.org/|Debian Sources website]] collects hashes and ctags of all Debian source code and allows [[https://sources.debian.org/advancedsearch/|searching]] for specific hashes and ctags, which may be useful for detecting duplication of source code and data.

If you have a particular file with some interesting aspect (security issue etc) you can likely find other copies using the [[DebianCodeSearch|Debian code search site]] or external code search engines such as [[https://code.ohloh.net/|Ohloh code]], [[https://searchcode.com/|searchcode]] and [[https://github.com/|GitHub]].

If a file has a fairly unique name, you can often find copies of that file by searching the contents of Debian binary or source packages using apt-file:

{{{
apt-file search uniquename.py
apt-file search -I dsc uniquename.c
}}}

Various Debian folks keep track of embedded copies they found via usertags:
Line 25: Line 41:
[[https://udd.debian.org/cgi-bin/bts-usertags.cgi?tag=embedded-code-copy&user=mbehrle@debian.org|mbehrle@debian.org]]
Line 31: Line 48:
 * [[https://fedoraproject.org/wiki/Packaging:Guidelines#Bundling_and_Duplication_of_system_libraries|Fedora policy]]  * [[https://docs.fedoraproject.org/en-US/packaging-guidelines/#bundling|Fedora policy]] ([[https://fedoraproject.org/wiki/Bundled_Libraries|more]])

Debian Policy 4.13 states that Debian packages should not use convenience copies.

Embedded copies (of code, data, fonts or other things) should be removed from the upstream VCS and source tarballs. Upstream might want to only embed the copies in the binary packages they distribute, script the install of their dependencies and or bundle the dependencies into a single but separate source tarball rather than embedding copies of them. Once upstream has fixed the issue, the Debian package can then be updated to the fixed version. If upstream refuse to remove the embedded copies, then Debian should either repack the upstream tarball using Files-Excluded (if there is a DFSG or size issue) or remove the files in debian/rules clean and very early in debian/rules build, so that there is no chance of them being used by the build process.

The list of packages that embed copies (including unused ones) of other projects is maintained in the security-tracker git repository.

https://salsa.debian.org/security-tracker-team/security-tracker/raw/master/data/embedded-code-copies

This list also contains information about forks so that the security team can check if all forks contain the same vulnerabilities.

All Debian members have commit access to the security-tracker repository and others can send suggestions or additions to the debian-security-tracker mailing list.

Lintian detects embedding of feedparser, common JavaScript/C/C++/PEAR/PHP libraries, PostScript fragments (1 2) and fonts.

check-all-the-things has a couple of tests (embed-readme, embed-dirs) for finding embedded copies via heuristics, and several ideas for new tests.

These wiki pages mention embedded copies: arc4random

These gobby pages mention embedded copies: Teams/Perl/Embedded_modules_in_inc.

The Debian duplication detector detects duplicate files in binary packages and may be useful for detecting verbatim duplication of files across multiple binary packages.

Clonewise is a tool not yet in Debian that could be used to find unfixed vulnerabilities because of embedded code copies. SourcererCC is another tool for detecting embedded code copies. Sokrates can also do duplication detection.

The Debian Sources website collects hashes and ctags of all Debian source code and allows searching for specific hashes and ctags, which may be useful for detecting duplication of source code and data.

If you have a particular file with some interesting aspect (security issue etc) you can likely find other copies using the Debian code search site or external code search engines such as Ohloh code, searchcode and GitHub.

If a file has a fairly unique name, you can often find copies of that file by searching the contents of Debian binary or source packages using apt-file:

apt-file search uniquename.py
apt-file search -I dsc uniquename.c

Various Debian folks keep track of embedded copies they found via usertags:

rbrito@ime.usp.br jwilk@debian.org mbehrle@debian.org pabs@debian.org sramacher@debian.org dr@jones.dk

See also