Differences between revisions 11 and 53 (spanning 42 versions)
Revision 11 as of 2017-01-31 13:59:00
Size: 5992
Comment: Add section Other copyright files and license-related tools
Revision 53 as of 2022-04-12 05:54:24
Size: 18381
Comment: fix update URI to ghostscript
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
Reviewing upstream packages to write debian/copyright files is tedious but important manual work. It is done during initial packaging and after every new upstream release. ## Note on editing: Please use semantic newlines, to ease readability (also of emailed diffs).
#language en
||<tablestyle="width: 100%;" style="border: 0px hidden">~-[[DebianWiki/EditorGuide#translation|Translation(s)]]: [[de/CopyrightReviewTools|German]] - English -~ ||<style="text-align: right; border: 0px hidden"> (!) [[/Discussion|Discussion]]||
<<TableOfContents(2)>>

== Command-line tools in Debian ==

Reviewing upstream packages to write debian/copyright files is tedious but important manual work.
It is done during initial packaging and after every new upstream release.
Line 7: Line 15:
Note that some of the tools listed here are run by `check-all-the-things -f copyright`.

 * `licensecheck` from DebianPackage:licensecheck (and older versions of DebianPackage:devscripts) can scan source code and report found copyright holders and known licenses. Its approach is to detect licenses with a dataset (medium:~200 regexes) of regex patterns and key phrases (parts) and to reassemble these in detected licenses based on rules. In that sense this is somewhat similar to the combined approaches of Fossology/nomos and Ninka (see below for these tools). It also detects copyright statements. It output results in plain text (with customizable delimiter) or a Debian copyright file format. This is a command line tool written in Perl.


 * `scan-copyrights` from DebianPackage:libconfig-model-dpkg-perl can update an existing copyright file from rescanning the source. It can also create one from scratch. It uses `licensecheck`.

 * Config::Model can update Debian copyright files using the `cme` command (from DebianPackage:cme or DebianPackage:libconfig-model-dpkg-perl less than 2.063):
=== licensecheck ===

`licensecheck` from DebianPackage:licensecheck (and older versions of DebianPackage:devscripts) can scan source code
and report found copyright holders and known licenses.
Its approach is to detect licenses with a dataset (medium:~200 regexes) of regex patterns and key phrases (parts)
and to reassemble these in detected licenses based on rules.
In that sense this is somewhat similar to the combined approaches of Fossology/nomos and Ninka (see below for these tools).
It also detects copyright statements.
It output results in plain text (with customizable delimiter) or a Debian copyright file format.
Written in Perl.

{{{
licensecheck --check '.*' --recursive --deb-machine --lines 0 *
}}}

licensecheck notably does not extract metadata from binary files
[[DebianBug:828941|yet]],
and a recommended workaround is to couple it with [[DebianPackage:libimage-exiftool-perl|exiftool]]
like this:

{{{
1>&2 exiftool '-textOut!' %d%f.%e:meta -short -short -recurse -ext ttf .
licensecheck --copyright --deb-machine --recursive --lines 0 --check '.*' --ignore '.*\.ttf$' -- *
find -type f -name '*.ttf' -delete
}}}

For more elaborate and actively maintained examples,
see [[https://salsa.debian.org/debian/ghostscript/-/blob/debian/latest/debian/copyright-check|ghostscript]]
and [[https://salsa.debian.org/js-team/emscripten/-/blob/debian/latest/debian/copyright-check|emscripten]].

=== licensing ===

`licensing` from DebianPackage:licenseutils is primarily for adding license boilerplate to new code
but can also scan source code and report found licenses.
Written in C.

{{{
licensing detect *
}}}

(2020-09-19 - bug DebianBug:970580 created)

=== scan-copyrights ===

`scan-copyrights` from DebianPackage:libconfig-model-dpkg-perl can update an existing copyright file from rescanning the source.
It can also create one from scratch.
Written in Perl, using [[#licensecheck|licensecheck]].

=== cme ===

Config::Model can update Debian copyright files using the `cme` command
(from DebianPackage:cme or DebianPackage:libconfig-model-dpkg-perl less than 2.063).
Written in Perl, using [[#licensecheck|licensecheck]].
Line 20: Line 73:
 * A script from DebianPackage:cdbs can generate a copyright file using `licensecheck`:

{{{
licensecheck --copyright -r `find * -type f` | \
  /usr/lib/cdbs/licensecheck2dep5 > debian/copyright.auto
}}}

 * `license-reconcile` compares the existing copyright with the source code and reports discrepancies.

 * `debmake -k` also compares the existing copyright with the source code and reports discrepancies.

 * `debmake -cc` generates a new copyright file from the source code.

 * `decopy` generates debian/copyright files.

 * `licensee` from DebianPackage:ruby-licensee checks LICENSE files and returns known license names. This is the [[https://github.com/benbalter/licensee| tool used by Github]] to provide a summary license indication on a repository main page. Its approach is to search for typical LICENSE file names or some package manifest (NPM, Bower, Gemfile, etc) and perform an exact or approximate license text matching against the set of common licenses texts as published at [[https://choosealicense.com]] (small: ~20). It output results in YAML format. This is a command line tool written in Ruby.

 * [[https://www.fossology.org/|FOSSology]] is a open source license compliance software system and toolkit that [[https://debconf16.debconf.org/talks/100/|can]] (in version 3.1) generate DEP5 copyright files. Its approach is to detect licenses with a either large (large:~6000 regexes) dataset of regex patterns (nomos) or a full string comparison against license full texts (large: ~400 text) (monk). It also detects copyright statements and does also integrate with Ninka (see below). This is a complete database-backed web application with some command line support written in C/C++ with a PHP frontend.

 * [[https://github.com/pivotal/LicenseFinder|LicenseFinder]] is a tool that "Find licenses for your project's dependencies." It does so by running application-specific package management tools and detecting package manifests to collect license-related metadata (e.g. Gemfile, etc) and detect licensing using regex against a set of common license texts (small: ~20). It output results in CSV, HTML and other report format. This is a command line tool written in Ruby.

 * [[https://github.com/dmgerman/ninka|Ninka]] is a "license identification tool for Source Code". Its approach is to detect licenses from text sentences using a dataset of key license sentences (large: ~600) and assemble the results based on the matched sentences. It output results in CSV format. This is a command line tool written in Perl.

 * [[https://github.com/nexB/scancode-toolkit/|ScanCode]] is a tool "to scan code and detect licenses, copyrights and more". Its approach is to detect licenses using a dataset of plain license texts (large:~1000 texts) and plain text notices (large:~2500 notices and mentions) and finds exact and approximate matches in source and binaries using full text alignments. It also detects copyright statements and collect license metadata from package manifests (e.g Maven, Pypi, etc.). It output results in JSON, HTML or SPDX format. This is a command line tool written in Python.


= Other copyright files and license-related tools =

 * In Python:
  * [[https://packages.qa.debian.org/p/python-debian.html|python-debian]] has support parsing and creating copyright files (and any Debian-style files such as description, control, Sources, Packages, etc.)
  * [[dlt|https://github.com/agustinhenze/dlt/]] has support for parsing and creating copyright files.
  * [decopy|https://anonscm.debian.org/git/collab-maint/decopy.git/]] is a tool that "automates creating and updating the debian/copyright files." and also "decopy aims to detects as many licenses as possible" which would make it a tool for license detection too. It uses `python-debian` to handle copyright files.
  * [[Debian packaging tools|https://github.com/xolox/python-deb-pkg-tools]] is "a collection of functions to work with Debian packages and repositories. It uses `python-debian` to handle copyright files.


= See also =
Usage is detailed in [[https://github.com/dod38fr/config-model/wiki/Updating-debian-copyright-file-with-cme|Config::Model wiki]]

=== licensecheck2dep5 ===

A script from DebianPackage:cdbs can create a copyright file by tidying output from `licensecheck`.
Written in Perl, using [[#licensecheck]].

{{{
licensecheck --check '.*' --recursive --copyright --deb-fmt --lines 0 * | /usr/lib/cdbs/licensecheck2dep5
}}}

/!\ This tool is discouraged -
better use [[licensecheck with exiftool|#licensecheck]]

=== license-miner ===

A script from DebianPackage:cdbs can extract structured metadata embedded in binary content,
for subsequent parsing by [[#licensecheck]] and suffix stripping by [[#licensecheck2dep5]].
Written in Perl, using `Image::ExifTool` and `Font::TTF`.

{{{
find -type f -name '*.png' -print0 | perl -0 /usr/lib/cdbs/license-miner
licensecheck --check '.*' --ignore '.+\.png$' --recursive --copyright --deb-fmt --lines 0 * | /usr/lib/cdbs/licensecheck2dep5
find -type f -name '*.png.metadata' -delete
}}}

=== CDBS ===

A makefile from DebianPackage:cdbs can automate selection, mining, parsing, and cleanup,
comparing previously autogenerated file `debian/copyright_hints` included with source package
with freshly autogenerated instance and warning about newly introduced (but not disappearing) changes to discovered hints,
using [[#license-miner]] and [[#licensecheck]] and [[#licensecheck2dep5]] under the hood.
Written in make.

Typical use is by shipping a package-specific script `debian/copyright-check with source package
and executing that script manually (not as part of normal build) when sources change:

{{{
#!/bin/sh

export DEB_COPYRIGHT_EXTRACT_EXTS="icc pdf png ttf"
export DEB_COPYRIGHT_EXTRACT_PATHS_EXIF="Resource/Font/"
export DEB_COPYRIGHT_CHECK_IGNORE_EXTS="cat ico xls pcl xps"
export DEB_COPYRIGHT_CHECK_IGNORE_PATHS="doc/.*\.htm"
export DEB_COPYRIGHT_CHECK_MERGE_SAME_LICENSE=yes

make -f /usr/share/cdbs/1/rules/utils.mk pre-build || true
make -f /usr/share/cdbs/1/rules/utils.mk clean DEB_COPYRIGHT_CHECK_STRICT=1
}}}

/!\ This tool is discouraged -
better use [[licensecheck with exiftool|#licensecheck]]

=== license-reconcile ===

`license-reconcile` compares the existing copyright with the source code and reports discrepancies.
Written in Perl, using [[#licensecheck|licensecheck]].

=== debmake ===

`debmake -k` also compares the existing copyright with the source code and reports discrepancies.

`debmake -cc` generates a new copyright file from the source code.


=== license-detector ===

`license-detector` scans quickly for licenses within paths, and loosely summarizes a percentage covered by each SPDX license.
Written in Go.

=== decopy ===

[[https://salsa.debian.org/debian/decopy|decopy]] is a tool that "automates creating and updating the debian/copyright files."
It also "aims to detects as many licenses as possible" which makes it a tool for license detection too.
It uses `python-debian` to handle Debian machine readable copyright files.
Its approach to detect licenses is the same as `license-checker`.
Written in Python, using [[#python-debian|python-debian]].

=== licensee ===

`licensee` from DebianPackage:ruby-licensee checks LICENSE files and returns known license names.
This is the [[https://github.com/benbalter/licensee| tool used by Github]] to provide a summary license indication on a repository main page.
Its approach is to search for typical LICENSE file names or some package manifest (NPM, Bower, Gemfile, etc)
and perform an exact or approximate license text matching against the set of common licenses texts
as published at [[https://choosealicense.com]] (small: ~20).
It output results in YAML format.
Written in Ruby.

=== check-all-the-things ===

Wrapper for some of the other tools listed here.

{{{
check-all-the-things -f copyright
}}}

=== cargo-lichking ===

Automated license checking for rust. cargo lichking is a Cargo subcommand that checks licensing information for dependencies,
based on [[http://www.dwheeler.com/essays/floss-license-slide.html|David A. Wheeler's compatibility graph]].

{{{
cargo lichking check
}}}

=== reuse ===
Tool for checking and helping with compliance with the [[https://reuse.software|REUSE Initiative]] recommendations.
Uses a combination of SPDX license identifiers and Debian machine readable copyright files to document license in a project.
Written in Python.


== Libraries in Debian ==

=== python-debian ===

[[https://packages.qa.debian.org/p/python-debian.html|python-debian]] has support parsing and creating copyright files (and any Debian-style files such as description, control, Sources, Packages, etc.)
Written in Python.

== Command-line tools not in Debian ==

=== license_finder ===

[[https://github.com/pivotal/LicenseFinder|LicenseFinder]] is a tool that "Find licenses for your project's dependencies."
It does so by running application-specific package management tools
and detecting package manifests to collect license-related metadata (e.g. Gemfile, etc)
and detect licensing using regex against a set of common license texts (small: ~20) and license names.
It outputs results in CSV, HTML and other report format.
Written in Ruby.

=== licensed ===

[[https://github.com/github/licensed|licensed]] is used to check the licenses of the dependencies of a project. Modern language package managers (bower, bundler, cabal, go, npm, stack) are used to pull the dependency chain of a specific project. Licenses can be configured to be either accepted or rejected, easing the developer task of identifying problematic dependencies when importing a new third-party library. Use github/licensee for license detection. Written in Ruby.


=== scancode-toolkit ===

[[https://github.com/nexB/scancode-toolkit/|ScanCode]] is a tool "to scan code and detect licenses, copyrights and more".
Its approach is to detect licenses using a dataset of plain license texts (large:~1,760 texts) available as an online [[https://scancode-licensedb.aboutcode.org/|licensedb]]; and a comprehensive library of license notices, mentions and references (large:~30,000 notices and mentions). ScanCode finds exact and approximate matches in source and binaries using a combination of checksums, automatons, and full text alignments (e.g. diffs) as well as SPDX license identifiers. It can return the exact matched text (and the parts of a text that are not matched e.g. added or removed). It detects and normalizes structured license tags in package manifests including the ability to parse, detect and normalize Debian copyright files, with special support for structured DEP-5 machine readable files using the [[https://github.com/nexB/debian-inspector|debian-inspector]] library. It can also output a Debian copyright format. And can collect license information for installed packages from processing the status file.
It also detects copyright statements and collects license metadata from package manifests (e.g Maven, npm, rpm, Debian, Cargo, Cocoapods, Bower, Composer, Pypi, Alpine, and many more).
It output results in JSON, YAML HTML, Debian copyright or SPDX format.
It is written in Python with some native C/C++ extensions.


=== apache-rat ===

[[https://github.com/apache/creadur-rat/| Apache Creadur rat]] is a "tool to improve accuracy and efficiency when checking releases." .
Its goal is to help Apache Foundation projects to comply with the release policy including detecting licenses.
Its approach is to use a key sentences dataset (small: ~20).
Written in Java.


=== cargo-deny ===

[[https://crates.io/crates/cargo-deny|cargo-deny]] is a plugin to Rust helper tool cargo, to recursively check project-wide licensing hints for all dependent Rust crates, and check that they match a set of allow/deny condidtions.
Written in Rust.


=== Other tools that need further detailing and review ===

 * [[https://github.com/daald/dpkg-licenses|daald/dpkg-licenses]] "A command line tool which lists the licenses of all installed packages in a Debian-based system (like Ubuntu)". Written in Shell script.

 * [[https://github.com/mwittig/npm-license-crawler|mwittig/npm-license-crawler]] "Analyzes license information for multiple node.js modules (package.json files) as part of your software project". Written in JavaScript.

 * [[https://github.com/fossology/atarashi|fossology/atarashi]] "Atarashi scans for license statements in open source software, focusing on text statistics. Designed to work stand-alone and with FOSSology". Written in Python.

 * [[https://github.com/heremaps/oss-review-toolkit|heremaps/oss-review-toolkit]] "A suite of tools to assist with reviewing Open Source Software dependencies. http://oss-review-toolkit.org/ " . Written in Kotlin.

 * [[https://github.com/google/licenseclassifier|google/licenseclassifier]] "A License Classifier". Written in Go.

 * [[https://github.com/google/licensecheck|google/licensecheck]] "The licensecheck package classifies license files and heuristically determines how well they correspond to known open source licenses". Written in Go.

 * [[https://github.com/google/go-licenses|google/go-licenses]] "Reports on the licenses used by a Go package and its dependencies". Written in Go. Uses google/licenseclassifier for license detection.

 * [[https://github.com/src-d/go-license-detector|src-d/go-license-detector]] "Reliable project licenses detector." Written in Go. Less active since source-d closed shop.

 * [[https://github.com/amzn/askalono|amzn/askalono]] "A tool & library to detect open source licenses from texts https://amzn.to/askalono". Written in Rust.

 * [[https://github.com/boyter/lc|boyter/lc]] "licensechecker (lc) a command line application which scans directories and identifies what software license things are under producing reports as either SPDX, CSV, JSON, XLSX or CLI Tabular output". Written in Go.

 * [[https://github.com/nexB/debian-inspector|debian-inspector]] "A python library to parse Debian deb822-style control and copyright files". This library can parse copyright and control files (similar to python-debian). Written in Python. Previously called "debut".

 * [[https://github.com/nexB/license-expression/|license-expression]] "Utility library to parse, normalize and compare License expressions using a boolean logic engine. For expressions using SPDX or any other license id scheme." Written in Python.


== Applications ==

=== fossology ===

[[https://www.fossology.org/|FOSSology]] is a open source license compliance software system and toolkit
that [[https://debconf16.debconf.org/talks/100/|can]] (in version 3.1) generate DEP5 copyright files.
Its approach is to detect licenses with a either large (large:~2500 regexes) dataset of regex patterns (nomos)
or a full string comparison against license full texts (large: ~400 text) (monk).
It also detects copyright statements and does also integrate with Ninka (see below).
This is a complete database-backed web application with some command line support written in C/C++ with a PHP frontend.


=== scancode.io ===

[[https://scancodeio.readthedocs.io|ScanCode.io]] is a "server to script and automate the process of Software Composition Analysis (SCA) to identify any open source components and their license compliance data in an application’s codebase." ScanCode.io can be used for various use cases, such as Docker container and VM composition analyses, among other applications."
It embeds ScanCode as a primary detection tool. [[https://github.com/nexB/scancode.io|ScanCode.io]] can analyze a complete Debian installed system for license such as a Docker image or a VM image and provides a web UI, a JSON ReST API, a CLI interface, JSON and XLSX outputs, and a plugin API for extending and creating custom analysis pipelines. It is written in Python with Django and PostgreSQL.



== Obsolete code ==

=== OSLC ===
[[https://forge.ow2.org/projects/oslcv3/|OSLCv3]] Open Source License Checker 3.0 is a "risk management tool for analyzing open source software licenses."
It detects licenses using key sentences and diffs using a dataset of license texts (small: ~50).
It is developed in Java and seems no longer under development since 2009.

=== ninka ===

[[https://github.com/dmgerman/ninka|Ninka]] is a "license identification tool for Source Code".
Its approach is to detect licenses from text sentences using a dataset of key license sentences (large: ~600)
and assemble the results based on the matched sentences.
It output results in CSV format.
Written in Perl. Unmaintained since 2017.

=== jninka ===

[[https://github.com/whitesource/jninka/|jninka]] is a port from Perl to Java of `ninka`.
Written in Java. Unmaintained/retired project.
=== slic ===

[[https://github.com/gerv/slic|gerv/slic]] "Speedy LIcense Checker and associated tools". Written in Python. No longer maintained since the death of its author.


=== dlt ===

[[https://github.com/agustinhenze/dlt/|dlt]] has support for parsing and creating Debian machine readable copyright files. Written in Python. Unmaintained/retired project.


=== jfrog/go-license-discovery ===
 * [[https://github.com/jfrog/go-license-discovery|jfrog/go-license-discovery]] "A go library for matching text against known OSS licenses". Written in Go. Uses google/licenseclassifier for license detection. No longer maintained.

=== codeauroraforum/lid ===
 * [[https://github.com/codeauroraforum/lid|codeauroraforum/lid]] "License Identifier. The purpose of this program, license_identifier, is to scan the source code files and identify the license text region and the type of license.". Written in Python. No longer maintained.



== See also ==
Line 60: Line 317:
 * Bachelor Thesis: [[https://osr.cs.fau.de/2019/08/07/final-thesis-a-comparison-study-of-open-source-license-crawler/|A Comparison Study of Open Source License Crawlers]] ([[https://osr.cs.fau.de/wp-content/uploads/2019/08/wolter_2019.pdf|PDF]]) by Thomas Wolter
 * [[CopyrightReview|Peer review for copyright files]]
 * [[https://github.com/maxhbr/LicenseScannerComparison]] A comparison of license scanners.
 * [[https://osr.cs.fau.de/2019/08/07/final-thesis-a-comparison-study-of-open-source-license-crawler/]] and [[https://web.archive.org/web/20200128142101/https://osr.cs.fau.de/wp-content/uploads/2019/08/wolter_2019.pdf]] A comparison of license scanners.

 * [[https://clearlydefined.io/| ClearlyDefined]] Massive license scanning (with scancode) and peer review for license clarity and correctness.
 * [[https://salsa.debian.org/eighthave/debian-copyright-survey|debian-copyright-survey.py]] scrape and cache copyright files from all Debian packages
 * [[https://github.com/topics/license-management|Github license-management category]]

----

CategoryPackaging

Translation(s): German - English

(!) ?Discussion

Command-line tools in Debian

Reviewing upstream packages to write debian/copyright files is tedious but important manual work. It is done during initial packaging and after every new upstream release.

Making initial copyright file construction and subsequent review/update easier will improve Debian's software quality.

Starting with Stretch (Debian 9) there are significantly improved tools over previous releases to help.

licensecheck

licensecheck from licensecheck (and older versions of devscripts) can scan source code and report found copyright holders and known licenses. Its approach is to detect licenses with a dataset (medium:~200 regexes) of regex patterns and key phrases (parts) and to reassemble these in detected licenses based on rules. In that sense this is somewhat similar to the combined approaches of Fossology/nomos and Ninka (see below for these tools). It also detects copyright statements. It output results in plain text (with customizable delimiter) or a Debian copyright file format. Written in Perl.

licensecheck --check '.*' --recursive --deb-machine --lines 0 *

licensecheck notably does not extract metadata from binary files yet, and a recommended workaround is to couple it with exiftool like this:

1>&2 exiftool '-textOut!' %d%f.%e:meta -short -short -recurse -ext ttf .
licensecheck --copyright --deb-machine --recursive --lines 0 --check '.*' --ignore '.*\.ttf$' -- *
find -type f -name '*.ttf' -delete

For more elaborate and actively maintained examples, see ghostscript and emscripten.

licensing

licensing from licenseutils is primarily for adding license boilerplate to new code but can also scan source code and report found licenses. Written in C.

licensing detect *

(2020-09-19 - bug 970580 created)

scan-copyrights

scan-copyrights from libconfig-model-dpkg-perl can update an existing copyright file from rescanning the source. It can also create one from scratch. Written in Perl, using licensecheck.

cme

Config::Model can update Debian copyright files using the cme command (from cme or libconfig-model-dpkg-perl less than 2.063). Written in Perl, using licensecheck.

cme update dpkg-copyright

Usage is detailed in Config::Model wiki

licensecheck2dep5

A script from cdbs can create a copyright file by tidying output from licensecheck. Written in Perl, using #licensecheck.

licensecheck --check '.*' --recursive --copyright --deb-fmt --lines 0 * | /usr/lib/cdbs/licensecheck2dep5

/!\ This tool is discouraged - better use ?#licensecheck

license-miner

A script from cdbs can extract structured metadata embedded in binary content, for subsequent parsing by #licensecheck and suffix stripping by #licensecheck2dep5. Written in Perl, using Image::ExifTool and Font::TTF.

find -type f -name '*.png' -print0 | perl -0 /usr/lib/cdbs/license-miner
licensecheck --check '.*' --ignore '.+\.png$' --recursive --copyright --deb-fmt --lines 0 * | /usr/lib/cdbs/licensecheck2dep5
find -type f -name '*.png.metadata' -delete

CDBS

A makefile from cdbs can automate selection, mining, parsing, and cleanup, comparing previously autogenerated file debian/copyright_hints included with source package with freshly autogenerated instance and warning about newly introduced (but not disappearing) changes to discovered hints, using #license-miner and #licensecheck and #licensecheck2dep5 under the hood. Written in make.

Typical use is by shipping a package-specific script `debian/copyright-check with source package and executing that script manually (not as part of normal build) when sources change:

export DEB_COPYRIGHT_EXTRACT_EXTS="icc pdf png ttf"
export DEB_COPYRIGHT_EXTRACT_PATHS_EXIF="Resource/Font/"
export DEB_COPYRIGHT_CHECK_IGNORE_EXTS="cat ico xls pcl xps"
export DEB_COPYRIGHT_CHECK_IGNORE_PATHS="doc/.*\.htm"
export DEB_COPYRIGHT_CHECK_MERGE_SAME_LICENSE=yes

make -f /usr/share/cdbs/1/rules/utils.mk pre-build || true
make -f /usr/share/cdbs/1/rules/utils.mk clean DEB_COPYRIGHT_CHECK_STRICT=1

/!\ This tool is discouraged - better use ?#licensecheck

license-reconcile

license-reconcile compares the existing copyright with the source code and reports discrepancies. Written in Perl, using licensecheck.

debmake

debmake -k also compares the existing copyright with the source code and reports discrepancies.

debmake -cc generates a new copyright file from the source code.

license-detector

license-detector scans quickly for licenses within paths, and loosely summarizes a percentage covered by each SPDX license. Written in Go.

decopy

decopy is a tool that "automates creating and updating the debian/copyright files." It also "aims to detects as many licenses as possible" which makes it a tool for license detection too. It uses python-debian to handle Debian machine readable copyright files. Its approach to detect licenses is the same as license-checker. Written in Python, using python-debian.

licensee

licensee from ruby-licensee checks LICENSE files and returns known license names. This is the tool used by Github to provide a summary license indication on a repository main page. Its approach is to search for typical LICENSE file names or some package manifest (NPM, Bower, Gemfile, etc) and perform an exact or approximate license text matching against the set of common licenses texts as published at https://choosealicense.com (small: ~20). It output results in YAML format. Written in Ruby.

check-all-the-things

Wrapper for some of the other tools listed here.

check-all-the-things -f copyright

cargo-lichking

Automated license checking for rust. cargo lichking is a Cargo subcommand that checks licensing information for dependencies, based on David A. Wheeler's compatibility graph.

cargo lichking check

reuse

Tool for checking and helping with compliance with the REUSE Initiative recommendations. Uses a combination of SPDX license identifiers and Debian machine readable copyright files to document license in a project. Written in Python.

Libraries in Debian

python-debian

python-debian has support parsing and creating copyright files (and any Debian-style files such as description, control, Sources, Packages, etc.) Written in Python.

Command-line tools not in Debian

license_finder

LicenseFinder is a tool that "Find licenses for your project's dependencies." It does so by running application-specific package management tools and detecting package manifests to collect license-related metadata (e.g. Gemfile, etc) and detect licensing using regex against a set of common license texts (small: ~20) and license names. It outputs results in CSV, HTML and other report format. Written in Ruby.

licensed

licensed is used to check the licenses of the dependencies of a project. Modern language package managers (bower, bundler, cabal, go, npm, stack) are used to pull the dependency chain of a specific project. Licenses can be configured to be either accepted or rejected, easing the developer task of identifying problematic dependencies when importing a new third-party library. Use github/licensee for license detection. Written in Ruby.

scancode-toolkit

ScanCode is a tool "to scan code and detect licenses, copyrights and more". Its approach is to detect licenses using a dataset of plain license texts (large:~1,760 texts) available as an online licensedb; and a comprehensive library of license notices, mentions and references (large:~30,000 notices and mentions). ?ScanCode finds exact and approximate matches in source and binaries using a combination of checksums, automatons, and full text alignments (e.g. diffs) as well as SPDX license identifiers. It can return the exact matched text (and the parts of a text that are not matched e.g. added or removed). It detects and normalizes structured license tags in package manifests including the ability to parse, detect and normalize Debian copyright files, with special support for structured DEP-5 machine readable files using the debian-inspector library. It can also output a Debian copyright format. And can collect license information for installed packages from processing the status file. It also detects copyright statements and collects license metadata from package manifests (e.g Maven, npm, rpm, Debian, Cargo, Cocoapods, Bower, Composer, Pypi, Alpine, and many more). It output results in JSON, YAML HTML, Debian copyright or SPDX format. It is written in Python with some native C/C++ extensions.

apache-rat

Apache Creadur rat is a "tool to improve accuracy and efficiency when checking releases." . Its goal is to help Apache Foundation projects to comply with the release policy including detecting licenses. Its approach is to use a key sentences dataset (small: ~20). Written in Java.

cargo-deny

cargo-deny is a plugin to Rust helper tool cargo, to recursively check project-wide licensing hints for all dependent Rust crates, and check that they match a set of allow/deny condidtions. Written in Rust.

Other tools that need further detailing and review

  • daald/dpkg-licenses "A command line tool which lists the licenses of all installed packages in a Debian-based system (like Ubuntu)". Written in Shell script.

  • mwittig/npm-license-crawler "Analyzes license information for multiple node.js modules (package.json files) as part of your software project". Written in ?JavaScript.

  • fossology/atarashi "Atarashi scans for license statements in open source software, focusing on text statistics. Designed to work stand-alone and with FOSSology". Written in Python.

  • heremaps/oss-review-toolkit "A suite of tools to assist with reviewing Open Source Software dependencies. http://oss-review-toolkit.org/ " . Written in Kotlin.

  • google/licenseclassifier "A License Classifier". Written in Go.

  • google/licensecheck "The licensecheck package classifies license files and heuristically determines how well they correspond to known open source licenses". Written in Go.

  • google/go-licenses "Reports on the licenses used by a Go package and its dependencies". Written in Go. Uses google/licenseclassifier for license detection.

  • src-d/go-license-detector "Reliable project licenses detector." Written in Go. Less active since source-d closed shop.

  • amzn/askalono "A tool & library to detect open source licenses from texts https://amzn.to/askalono". Written in Rust.

  • boyter/lc "licensechecker (lc) a command line application which scans directories and identifies what software license things are under producing reports as either SPDX, CSV, JSON, XLSX or CLI Tabular output". Written in Go.

  • debian-inspector "A python library to parse Debian deb822-style control and copyright files". This library can parse copyright and control files (similar to python-debian). Written in Python. Previously called "debut".

  • license-expression "Utility library to parse, normalize and compare License expressions using a boolean logic engine. For expressions using SPDX or any other license id scheme." Written in Python.

Applications

fossology

FOSSology is a open source license compliance software system and toolkit that can (in version 3.1) generate DEP5 copyright files. Its approach is to detect licenses with a either large (large:~2500 regexes) dataset of regex patterns (nomos) or a full string comparison against license full texts (large: ~400 text) (monk). It also detects copyright statements and does also integrate with Ninka (see below). This is a complete database-backed web application with some command line support written in C/C++ with a PHP frontend.

scancode.io

ScanCode.io is a "server to script and automate the process of Software Composition Analysis (SCA) to identify any open source components and their license compliance data in an application’s codebase." ?ScanCode.io can be used for various use cases, such as Docker container and VM composition analyses, among other applications." It embeds ?ScanCode as a primary detection tool. ScanCode.io can analyze a complete Debian installed system for license such as a Docker image or a VM image and provides a web UI, a JSON ReST API, a CLI interface, JSON and XLSX outputs, and a plugin API for extending and creating custom analysis pipelines. It is written in Python with Django and PostgreSQL.

Obsolete code

OSLC

OSLCv3 Open Source License Checker 3.0 is a "risk management tool for analyzing open source software licenses." It detects licenses using key sentences and diffs using a dataset of license texts (small: ~50). It is developed in Java and seems no longer under development since 2009.

ninka

Ninka is a "license identification tool for Source Code". Its approach is to detect licenses from text sentences using a dataset of key license sentences (large: ~600) and assemble the results based on the matched sentences. It output results in CSV format. Written in Perl. Unmaintained since 2017.

jninka

jninka is a port from Perl to Java of ninka. Written in Java. Unmaintained/retired project.

slic

gerv/slic "Speedy LIcense Checker and associated tools". Written in Python. No longer maintained since the death of its author.

dlt

dlt has support for parsing and creating Debian machine readable copyright files. Written in Python. Unmaintained/retired project.

jfrog/go-license-discovery

  • jfrog/go-license-discovery "A go library for matching text against known OSS licenses". Written in Go. Uses google/licenseclassifier for license detection. No longer maintained.

codeauroraforum/lid

  • codeauroraforum/lid "License Identifier. The purpose of this program, license_identifier, is to scan the source code files and identify the license text region and the type of license.". Written in Python. No longer maintained.

See also


CategoryPackaging