Translation(s): German - English

(!) ?Discussion

debian/copyright is a machine-readable file in a Debian package that specifies your legal rights to use that package, and the files in it. For details about the process of writing the file, see debian/copyright. For general packaging information, see Packaging. For tools to create other parts of a system, see PackagingTools.

Command-line tools in Debian

Reviewing upstream packages to write debian/copyright files is tedious but important manual work. It is done during initial packaging and after every new upstream release. Picking a good tool will help ensure your package's quality.

licensecheck

licensecheck from licensecheck scans source code and reports copyright holders and known licenses. It detects licenses with a (large: >1,000) set of regex patterns and key phrases (parts), somewhat like Fossology/nomos and Ninka. It also detects copyright statements.

Written in Perl; outputs results in plain text (with customizable delimiter) or a Debian copyright file format.

# Check all files in the current directory:
licensecheck --check '.*' --recursive --deb-machine --lines 0 -- *
# Or just check files in the current git commit:
licensecheck --check '.*' --recursive --deb-machine --lines 0 -- $(git ls-tree -r --name-only HEAD)

licensecheck does not extract metadata from binary files. If you need that, use it with exiftool:

1>&2 exiftool '-textOut!' %d%f.%e:meta -short -short -recurse -ext ttf .
licensecheck --copyright --deb-machine --recursive --lines 0 --check '.*' --ignore '.*\.ttf$' -- *
find -type f -name '*.ttf' -delete

Complex and actively-maintained examples can be found in e.g. ghostscript and emscripten.

licensing

licensing from licenseutils adds license boilerplate to new code, and can scan source code to report licenses.

Written in C.

licensing detect *

scan-copyrights

scan-copyrights from libconfig-model-dpkg-perl creates a copyright file from scratch.

Written in Perl; mainly uses licensecheck, and copyright update is handled by cme.

Main features:

cme

The Config::Model framework provides data models for various formats, including Debian copyright files. cme provides a graphical editor based on Config::Model, which you can use to update or create Debian copyright files.

Written in Perl; uses licensecheck.

Main features:

# Non-interactive:
cme update dpkg-copyright
# Guided editing of fields:
cme update dpkg-copyright --edit

For details, see Updating debian copyright file with cme.

license-reconcile

license-reconcile reports discrepancies between the existing copyright file and the source code.

Written in Perl; uses licensecheck, and is partially replaced by licenserecon.

licenserecon

licenserecon was introduced in trixie. It reports differences between an existing DEP-5 copyright file and the output of licensecheck.

Run make clean (or equivalent) before licenserecon. Otherwise, results may be contaminated by spurious reports from the extra files.

The results are only intended to report obvious errors - you still need to check the project manually.

Despite attempts to minimise false positives, they will occur if files spell their license differently to debian/copyright. This is especially common with complex licensing such as 'and'/'or' constructs and specific exceptions. False positives can be suppressed with a local configuration file.

Only files with a copyright header are checked. False negatives may occur if licensecheck cannot determine a file's license. Files named "copyright", "copying", "readme" etc. are not checked as they often specify the licenses of other files than their own.

debmake

debmake -k also reports discrepancies between the existing copyright file and the source code.

debmake -cc generates a new copyright file from the source code.

license-detector

license-detector scans for licenses within paths, and loosely summarizes a percentage covered by each SPDX license.

Written in Go.

decopy

decopy automates creating and updating debian/copyright.

Written in Python; uses python-debian to handle Debian machine-readable copyright files. Same approach to detection as license-checker.

licensee

licensee from ruby-licensee checks LICENSE files and returns known license names.

Used by GitHub to provide the license summary on repository main pages. It searches for typical LICENSE filenames or some package manifest (NPM, Bower, Gemfile, etc), then match license text against a (small: ~20) set of common licenses texts as published at https://choosealicense.com.

Written in Ruby; outputs results in YAML format.

check-all-the-things

Wrapper for some of the other tools listed here.

check-all-the-things -f copyright

cargo-lichking

Automated license checking for rust. cargo lichking is a Cargo subcommand that checks licensing information for dependencies, based on David A. Wheeler's compatibility graph.

cargo lichking check

reuse

Checks and helps with compliance with the REUSE Initiative recommendations. Uses a combination of SPDX license identifiers and Debian machine-readable copyright files to document licenses.

Written in Python.

CDBS-based command-line tools in Debian

CDBS is deprecated in favour of Debhelper, but may still be used by Haskell programs.

/!\ Avoid these tools unless you are still using cdbs.

licensecheck2dep5

A script from cdbs can create a copyright file by tidying output from licensecheck.

Written in Perl; uses #licensecheck.

licensecheck --check '.*' --recursive --copyright --deb-fmt --lines 0 * \
    | /usr/lib/cdbs/licensecheck2dep5

license-miner

A script from cdbs can extract structured metadata embedded in binary content, for subsequent parsing by #licensecheck and suffix stripping by #licensecheck2dep5.

Written in Perl; uses Image::ExifTool and Font::TTF.

find -type f -name '*.png' -print0 | perl -0 /usr/lib/cdbs/license-miner
licensecheck --check '.*' --ignore '.+\.png$' --recursive --copyright --deb-fmt --lines 0 * | /usr/lib/cdbs/licensecheck2dep5
find -type f -name '*.png.metadata' -delete

CDBS

A makefile from cdbs can automate selection, mining, parsing, and cleanup; and can compare previously autogenerated file debian/copyright_hints with the new file.

Written in make; uses #license-miner, #licensecheck and #licensecheck2dep5.

Typical use is shipping a package-specific script debian/copyright-check with the source package and executing it manually when sources change:

export DEB_COPYRIGHT_EXTRACT_EXTS="icc pdf png ttf"
export DEB_COPYRIGHT_EXTRACT_PATHS_EXIF="Resource/Font/"
export DEB_COPYRIGHT_CHECK_IGNORE_EXTS="cat ico xls pcl xps"
export DEB_COPYRIGHT_CHECK_IGNORE_PATHS="doc/.*\.htm"
export DEB_COPYRIGHT_CHECK_MERGE_SAME_LICENSE=yes

make -f /usr/share/cdbs/1/rules/utils.mk pre-build || true
make -f /usr/share/cdbs/1/rules/utils.mk clean DEB_COPYRIGHT_CHECK_STRICT=1

Libraries in Debian

python-debian

python-debian can parse and create copyright files (and any Debian-style files such as description, control, Sources, Packages, etc.).

Written in Python.

libsoftware-licensemoreutils-perl

libsoftware-licensemoreutils-perl can generate license summaries to go in copyright files. It's designed for use as a library, but can be called from the command-line:

LICENSE=GPL-2.0
perl -M Software::LicenseMoreUtils -e "print Software::LicenseMoreUtils->new_from_short_name({short_name => '$LICENSE'})->summary"

Command-line tools not in Debian

license_finder

LicenseFinder finds licenses for your project's dependencies.

It runs application-specific package management tools, detects package manifests, collects license-related metadata (e.g. Gemfile, etc), and regex-detects licenses against a (small: ~20) set of common license texts and license names.

Written in Ruby; outputs results in CSV, HTML and other formats.

licensed

licensed checks the licenses of the dependencies of a project. Modern language package managers (bower, bundler, cabal, go, npm, stack) are used to pull the dependency chain of a specific project. Licenses can be configured to be either accepted or rejected, easing the developer task of identifying problematic dependencies when importing a new third-party library.

Written in Ruby; uses github/licensee for license-detection.

scancode-toolkit

ScanCode detect licenses, copyrights and more.

Its detects licenses using a (large: ~1,760) set of plain license texts available as an online licensedb; and a (large: ~30,000) library of license notices, mentions and references. ScanCode finds exact and approximate matches in source and binaries using checksums, automatons, and full text alignments (e.g. diffs) as well as SPDX license identifiers.

It can return the exact matched text, and the parts of a text that are not matched (e.g. added or removed). It detects and normalizes structured license tags in package manifests; including the ability to parse, detect and normalize Debian copyright files, with special support for structured DEP-5 machine-readable files using the debian-inspector library.

It can also output Debian copyright format. And can collect license information for installed packages from processing the status file.

It also detects copyright statements and collects license metadata from package manifests (e.g Maven, npm, rpm, Debian, Cargo, Cocoapods, Bower, Composer, Pypi, Alpine, and many more).

Written in Python with some native C/C++ extensions; outputs JSON, YAML HTML, Debian copyright or SPDX format.

See also 983640.

apache-rat

Apache Creadur rat is a improves accuracy and efficiency when checking releases. Its goal is to help Apache Foundation projects to comply with the release policy, including detecting licenses. Uses a (small: ~20) set of key sentences.

Written in Java.

cargo-deny

cargo-deny is a cargo plugin that recursively checks project-wide licensing hints for all dependent Rust crates, and checks that they match a set of allow/deny conditions.

Written in Rust.

gitlog2copyright

gitlog2copyright prints "Copyright:" lines based on git history, with output formatted in a reasonable way. Can be useful for comparing with debian/copyright when indirect copyright clauses like "Copyright (C) The Authors" are used.

Other tools that need further detailing and review

Applications

fossology

FOSSology is an open source license compliance software system and toolkit that can (in version 3.1) generate DEP5 copyright files. Its detects licenses with a either a (large: ~2500) set of regex patterns (nomos) or a full string comparison against a (large: ~400) set of license full texts (monk). It also detects copyright statements and integrates with Ninka. This is a complete database-backed web application with some command-line support

Written in C/C++ with a PHP frontend.

scancode.io

ScanCode.io is a server to script and automate the process of Software Composition Analysis (SCA), to identify any open source components and their license compliance data in an application’s codebase. Can be used for such as Docker container and VM composition analyses, among other applications. It embeds ScanCode as a primary detection tool. ScanCode.io can analyze a complete Debian installed system for license such as a Docker image or a VM image and provides a web UI, a JSON ReST API, a CLI interface, JSON and XLSX outputs, and a plugin API for extending and creating custom analysis pipelines.

Written in Python with Django and PostgreSQL.

Obsolete code

OSLC

OSLCv3 Open Source License Checker 3.0 is a risk management tool for analyzing open source software licenses. It detects licenses using key sentences and diffs using a (small: ~50) set of license texts.

Written in Java; seems unmaintained since 2009.

ninka

Ninka is a license identification tool for source code. Its detects licenses from text sentences using a (large: ~600) set of key license sentences, and assembles the results based on the matched sentences.

Written in Perl; unmaintained since 2017, outputs in CSV format.

jninka

jninka is a port from Perl to Java of ninka.

Written in Java; unmaintained/retired.

slic

gerv/slic - "Speedy LIcense Checker and associated tools".

Written in Python; no longer maintained since the death of its author.

dlt

dlt has support for parsing and creating Debian machine-readable copyright files.

Written in Python; unmaintained/retired.

jfrog/go-license-discovery

jfrog/go-license-discovery - "A go library for matching text against known OSS licenses".

Written in Go; uses google/licenseclassifier for license detection, no longer maintained.

codeauroraforum/lid

codeauroraforum/lid - "License Identifier. The purpose of this program, license_identifier, is to scan the source code files and identify the license text region and the type of license.".

Written in Python; no longer maintained.

See also


CategoryPackaging