Differences between revisions 2 and 9 (spanning 7 versions)
Revision 2 as of 2016-06-15 02:39:29
Size: 2076
Editor: PaulWise
Comment: clean up, update with more options and external info
Revision 9 as of 2017-01-27 19:20:25
Size: 4603
Comment: Fixed typo
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
Reviewing upstream packages to write debian/copyright files is tedious manual work. It tends not to get done again after initial packaging, especially not on every release (when something may have changed). Reviewing upstream packages to write debian/copyright files is tedious but important manual work. It is done during initial packaging and after every new upstream release.
Line 3: Line 3:
Making initial copyright file construction, and subsequent review/update easier will improve Debian's software quality. Making initial copyright file construction and subsequent review/update easier will improve Debian's software quality.
Line 5: Line 5:
Stretch (Debian 9) has significantly improved tools over previous releases.  Starting with Stretch (Debian 9) there are significantly improved tools over previous releases to help.
Line 7: Line 7:
`licensecheck` tool from DebianPackage:devscripts can scan source code and report found copyright holders and known licenses. Note that some of the tools listed here are run by `check-all-the-things -f copyright`.
Line 9: Line 9:
`scan-copyrights` from DebianPackage:libconfig-model-dpkg-perl can update an existing copyright file from rescanning the source. It can also create one from scratch. It uses `licensecheck`.  * `licensecheck` from DebianPackage:licensecheck (and older versions of DebianPackage:devscripts) can scan source code and report found copyright holders and known licenses.
Line 11: Line 11:
Config::Model can update Debian copyright files using the `cme` command (from DebianPackage:cme or DebianPackage:libconfig-model-dpkg-perl less than 2.063):  * `scan-copyrights` from DebianPackage:libconfig-model-dpkg-perl can update an existing copyright file from rescanning the source. It can also create one from scratch. It uses `licensecheck`.

 *
Config::Model can update Debian copyright files using the `cme` command (from DebianPackage:cme or DebianPackage:libconfig-model-dpkg-perl less than 2.063):
Line 17: Line 19:
A script from DebianPackage:cdbs can generate a copyright file using `licensecheck`:  * A script from DebianPackage:cdbs can generate a copyright file using `licensecheck`:
Line 24: Line 26:
`license-reconcile` compares the existing copyright with the source code and reports discrepancies.  * `license-reconcile` compares the existing copyright with the source code and reports discrepancies.
Line 26: Line 28:
`debmake -k` also compares the existing copyright with the source code and reports discrepancies.  * `debmake -k` also compares the existing copyright with the source code and reports discrepancies.
Line 28: Line 30:
`debmake -cc` generates a new copyright file from the source code.  * `debmake -cc` generates a new copyright file from the source code.
Line 30: Line 32:
`licensee` from DebianPackage:ruby-licensee checks LICENSE files and returns known license names.  * `decopy` generates debian/copyright files.
Line 32: Line 34:
[[https://www.fossology.org/|FOSSology]] is a open source license compliance software system and toolkit.  * `licensee` from DebianPackage:ruby-licensee checks LICENSE files and returns known license names. This is the [[https://github.com/benbalter/licensee| tool used by Github]] to provide a summary license indication on a repository main page. Its approach is to search for typical LICENSE file names or some package manifest (NPM, Bower, Gemfile, etc) and perform an exact or approximate license text matching against the set of common licenses texts as published at [[https://choosealicense.com]] (small: ~20). It output results in YAML format. This is a command line tool written in Ruby.
Line 34: Line 36:
Some of the above are run by `check-all-the-things -f copyright`.  * [[https://www.fossology.org/|FOSSology]] is a open source license compliance software system and toolkit that [[https://debconf16.debconf.org/talks/100/|can]] (in version 3.1) generate DEP5 copyright files. Its approach is to detect licenses with a either large (large:~6000 regexes) dataset of regex patterns (nomos) or a full string comparison against license full texts (large: ~400 text) (monk). It also detects copyright statements and does also integrate with Ninka (see below). This is a complete database-backed web application with some command line support written in C/C++ with a PHP frontend.

 * [[https://github.com/pivotal/LicenseFinder|LicenseFinder]] is a tool that "Find licenses for your project's dependencies." It does so by running application-specific package management tools and detecting package manifests to collect license-related metadata (e.g. Gemfile, etc) and detect licensing using regex against a set of common license texts (small: ~20). It output results in CSV, HTML and other report format. This is a command line tool written in Ruby.

 * [[https://github.com/dmgerman/ninka|Ninka]] is a "license identification tool for Source Code". Its approach is to detect licenses from text sentences using a dataset of key license sentences (large: ~600) and assemble the results based on the matched sentences. It output results in CSV format. This is a command line tool written in Perl.

 * [[https://github.com/nexB/scancode-toolkit/|ScanCode]] is a tool "to scan code and detect licenses, copyrights and more". Its approach is to detect licenses using a dataset of plain license texts (large:~1000 texts) and plain text notices (large:~2500 notices and mentions) and finds exact and approximate matches in source and binaries using full text alignments. It also detects copyright statements and collect license metadata from package manifests (e.g Maven, Pypi, etc.). It output results in JSON, HTML or SPDX format. This is a command line tool written in Python.
Line 38: Line 47:

Reviewing upstream packages to write debian/copyright files is tedious but important manual work. It is done during initial packaging and after every new upstream release.

Making initial copyright file construction and subsequent review/update easier will improve Debian's software quality.

Starting with Stretch (Debian 9) there are significantly improved tools over previous releases to help.

Note that some of the tools listed here are run by check-all-the-things -f copyright.

  • licensecheck from licensecheck (and older versions of devscripts) can scan source code and report found copyright holders and known licenses.

  • scan-copyrights from libconfig-model-dpkg-perl can update an existing copyright file from rescanning the source. It can also create one from scratch. It uses licensecheck.

  • Config::Model can update Debian copyright files using the cme command (from cme or libconfig-model-dpkg-perl less than 2.063):

cme update dpkg-copyright
  • A script from cdbs can generate a copyright file using licensecheck:

licensecheck --copyright -r `find * -type f` | \
  /usr/lib/cdbs/licensecheck2dep5 > debian/copyright.auto
  • license-reconcile compares the existing copyright with the source code and reports discrepancies.

  • debmake -k also compares the existing copyright with the source code and reports discrepancies.

  • debmake -cc generates a new copyright file from the source code.

  • decopy generates debian/copyright files.

  • licensee from ruby-licensee checks LICENSE files and returns known license names. This is the tool used by Github to provide a summary license indication on a repository main page. Its approach is to search for typical LICENSE file names or some package manifest (NPM, Bower, Gemfile, etc) and perform an exact or approximate license text matching against the set of common licenses texts as published at https://choosealicense.com (small: ~20). It output results in YAML format. This is a command line tool written in Ruby.

  • FOSSology is a open source license compliance software system and toolkit that can (in version 3.1) generate DEP5 copyright files. Its approach is to detect licenses with a either large (large:~6000 regexes) dataset of regex patterns (nomos) or a full string comparison against license full texts (large: ~400 text) (monk). It also detects copyright statements and does also integrate with Ninka (see below). This is a complete database-backed web application with some command line support written in C/C++ with a PHP frontend.

  • LicenseFinder is a tool that "Find licenses for your project's dependencies." It does so by running application-specific package management tools and detecting package manifests to collect license-related metadata (e.g. Gemfile, etc) and detect licensing using regex against a set of common license texts (small: ~20). It output results in CSV, HTML and other report format. This is a command line tool written in Ruby.

  • Ninka is a "license identification tool for Source Code". Its approach is to detect licenses from text sentences using a dataset of key license sentences (large: ~600) and assemble the results based on the matched sentences. It output results in CSV format. This is a command line tool written in Perl.

  • ScanCode is a tool "to scan code and detect licenses, copyrights and more". Its approach is to detect licenses using a dataset of plain license texts (large:~1000 texts) and plain text notices (large:~2500 notices and mentions) and finds exact and approximate matches in source and binaries using full text alignments. It also detects copyright statements and collect license metadata from package manifests (e.g Maven, Pypi, etc.). It output results in JSON, HTML or SPDX format. This is a command line tool written in Python.

See also