## page was renamed from UrheberrechtsüberprüfungsWerkzeuge ## Note on editing: Please use semantic newlines, to ease readability (also of emailed diffs). #language de ||~-[[DebianWiki/EditorGuide#translation|Translation(s)]]: Deutsch - [[CopyrightReviewTools|English]] -~ || (!) [[/Discussion|Discussion]]|| <> == Kommandozeilenwerkzeuge in Debian == Das Durchsehen von Upstream-Paketen zum Schreiben von debian/copyright-Dateien ist eine mühselige, aber wichtige Handarbeit. Sie wird während der anfänglichen Paketierung und nach jeder neuen Veröffentlichung der Originalautoren durchgeführt. Die Erleichterung der anfänglichen Erstellung von Copyright-Dateien und die anschließende Überprüfung/Aktualisierung wird die Software-Qualität von Debian verbessern. Beginnend mit Stretch (Debian 9) gibt es gegenüber früheren Veröffentlichungen deutlich verbesserte Werkzeuge, die helfen sollen. === licensecheck === `licensecheck` aus dem Paket DebianPackage:licensecheck (und ältere Versionen aus dem Paket DebianPackage:devscripts) kann den Quellcode scannen und gefundene Urheberrechtsinhaber und bekannte Lizenzen melden. Sein Ansatz besteht darin, Lizenzen mit einem Datensatz (Im Mittel:~200 reguläre Ausdrücke) von Muster regulärer Ausdrücken und Schlüsselformulierungen (Teilen) aufzuspüren und diese in erkannte Lizenzen regelbasiert wieder zusammenzusetzen. In diesem Sinne ähnelt dies in gewisser Weise den kombinierten Ansätzen von Fossologie/Nomos und Ninka (siehe unten für diese Werkzeuge). Es erkennt auch Copyright-Aussagen. Es gibt die Ergebnisse in einfachem Text (mit anpassbarem Trennzeichen) oder in einem Debian-Copyright-Dateiformat (DEP-5) aus. Es ist in [[Perl]] geschrieben. {{{ licensecheck --check '.*' --recursive --deb-machine --lines 0 * }}} licensecheck extrahiert [[DebianBug:828941|noch keine]] Metadaten aus Binärdateien, und eine empfohlene Abhilfe ist die Kopplung mit [[DebianPackage:libimage-exiftool-perl|exiftool]] wie folgt: {{{ 1>&2 exiftool '-textOut!' %d%f.%e:meta -short -short -recurse -ext ttf . licensecheck --copyright --deb-machine --recursive --lines 0 --check '.*' --ignore '.*\.ttf$' -- * find -type f -name '*.ttf' -delete }}} Für ausführlichere und aktiv gepflegte Beispiele, siehe [[https://salsa.debian.org/printing-team/ghostscript/-/blob/debian/latest/debian/copyright-check|ghostscript]] und [[https://salsa.debian.org/js-team/emscripten/-/blob/debian/latest/debian/copyright-check|emscripten]]. === licensing === `licensing` aus dem Paket DebianPackage:licenseutils dient in erster Linie dazu, Lizenz-Vorlagen zu neuem Code hinzuzufügen, kann aber auch den Quellcode scannen und gefundene Lizenzen melden. Es ist in 'C' geschrieben. {{{ licensing detect * }}} (2020-09-19 - bug DebianBug:970580 aufgegeben) === scan-copyrights === `scan-copyrights` aus dem Paket DebianPackage:libconfig-model-dpkg-perl kann eine vorhandene Copyright-Datei durch erneutes Scannen der Quelle aktualisieren. Es kann auch eine von Grund auf neu erstellen. Es ist in [[Perl]] geschrieben und nutzt [[#licensecheck|licensecheck]]. === cme === Config::Model kann Debian copyright files mit dem `cme` Kommando (aus dem Pakete DebianPackage:cme or DebianPackage:libconfig-model-dpkg-perl less than 2.063) aktualisieren Es in geschrieben in [[Perl]] und nutzt [[#licensecheck|licensecheck]]. {{{ cme update dpkg-copyright }}} die Nutzung wird ausführlich auf [[https://github.com/dod38fr/config-model/wiki/Updating-debian-copyright-file-with-cme|Config::Model wiki]] beschrieben. === licensecheck2dep5 === Ein Skript aus DebianPackage:cdbs kann eine Copyright-Datei erstellen, indem es die Ausgabe von `licensecheck` aufräumt. Geschrieben in Perl, unter Verwendung von [[#licensecheck]]. {{{ licensecheck --check '.*' --recursive --copyright --deb-fmt --lines 0 * | /usr/lib/cdbs/licensecheck2dep5 }}} /!\ Von diesem Tool wird abgeraten - verwenden Sie besser [[licensecheck mit exiftool|#licensecheck]] === license-miner === Ein Skript aus DebianPackage:cdbs kann strukturierte Metadaten extrahieren, die in Binärinhalte eingebettet sind, für das anschließende Parsen durch [[#licensecheck]] und das Entfernen von Suffixen durch [[#licensecheck2dep5]]. Geschrieben in Perl, mit `Image::ExifTool` und `Font::TTF`. {{{ find -type f -name '*.png' -print0 | perl -0 /usr/lib/cdbs/license-miner licensecheck --check '.*' --ignore '.+\.png$' --recursive --copyright --deb-fmt --lines 0 * | /usr/lib/cdbs/licensecheck2dep5 find -type f -name '*.png.metadata' -delete }}} === CDBS === Ein Makefile aus DebianPackage:cdbs kann die Auswahl, das Mining, das Parsing und die Bereinigung automatisieren, die zuvor automatisch erzeugten Datei `debian/copyright_hints`, die im Quellpaket enthalten ist, mit einer neu erzeugten Instanz vergleichen und vor neu eingeführten (aber nicht verschwindenden) Änderungen an den gefundenen Hinweise warnen, unter Verwendung von [[#license-miner]] und [[#licensecheck]] und [[#licensecheck2dep5]] unter der Haube. Geschrieben in make. Typischerweise wird ein paketspezifisches Skript `debian/copyright-check mit dem Quellpaket ausgeliefert und dieses Skript manuell ausgeführt (nicht als Teil des normalen Builds), wenn sich die Quellen ändern: {{{ #!/bin/sh export DEB_COPYRIGHT_EXTRACT_EXTS="icc pdf png ttf" export DEB_COPYRIGHT_EXTRACT_PATHS_EXIF="Resource/Font/" export DEB_COPYRIGHT_CHECK_IGNORE_EXTS="cat ico xls pcl xps" export DEB_COPYRIGHT_CHECK_IGNORE_PATHS="doc/.*\.htm" export DEB_COPYRIGHT_CHECK_MERGE_SAME_LICENSE=yes make -f /usr/share/cdbs/1/rules/utils.mk pre-build || true make -f /usr/share/cdbs/1/rules/utils.mk clean DEB_COPYRIGHT_CHECK_STRICT=1 }}} /!\ Von diesem Tool wird abgeraten - verwenden Sie besser [[licensecheck mit exiftool|#licensecheck]] === license-reconcile === `license-reconcile` compares the existing copyright with the source code and reports discrepancies. Written in Perl, using [[#licensecheck|licensecheck]]. === debmake === `debmake -k` also compares the existing copyright with the source code and reports discrepancies. `debmake -cc` generates a new copyright file from the source code. === decopy === [[https://salsa.debian.org/debian/decopy|decopy]] is a tool that "automates creating and updating the debian/copyright files." It also "aims to detects as many licenses as possible" which makes it a tool for license detection too. It uses `python-debian` to handle Debian machine readable copyright files. Its approach to detect licenses is the same as `license-checker`. Written in Python, using [[#python-debian|python-debian]]. === licensee === `licensee` from DebianPackage:ruby-licensee checks LICENSE files and returns known license names. This is the [[https://github.com/benbalter/licensee| tool used by Github]] to provide a summary license indication on a repository main page. Its approach is to search for typical LICENSE file names or some package manifest (NPM, Bower, Gemfile, etc) and perform an exact or approximate license text matching against the set of common licenses texts as published at [[https://choosealicense.com]] (small: ~20). It output results in YAML format. Written in Ruby. === check-all-the-things === Wrapper for some of the other tools listed here. {{{ check-all-the-things -f copyright }}} === cargo-lichking === Automated license checking for rust. cargo lichking is a Cargo subcommand that checks licensing information for dependencies, based on [[http://www.dwheeler.com/essays/floss-license-slide.html|David A. Wheeler's compatibility graph]]. {{{ cargo lichking check }}} == Libraries in Debian == === python-debian === [[https://packages.qa.debian.org/p/python-debian.html|python-debian]] has support parsing and creating copyright files (and any Debian-style files such as description, control, Sources, Packages, etc.) Written in Python. == Command-line tools not in Debian == === license_finder === [[https://github.com/pivotal/LicenseFinder|LicenseFinder]] is a tool that "Find licenses for your project's dependencies." It does so by running application-specific package management tools and detecting package manifests to collect license-related metadata (e.g. Gemfile, etc) and detect licensing using regex against a set of common license texts (small: ~20). It output results in CSV, HTML and other report format. Written in Ruby. === licensed === [[https://github.com/github/licensed|licensed]] has been recently released by GitHub to check the licenses of the dependencies of a project. Modern language package managers (bower, bundler, cabal, go, npm, stack) are used to pull the dependency chain of a specific project. Licenses can be configured to be either accepted or rejected, easing the developer task of identifying problematic dependencies when importing a new third-party library. === scancode-toolkit === [[https://github.com/nexB/scancode-toolkit/|ScanCode]] is a tool "to scan code and detect licenses, copyrights and more". Its approach is to detect licenses using a dataset of plain license texts (large:~1,500 texts) and plain text notices (large:~15,000 notices and mentions) and finds exact and approximate matches in source and binaries using full text alignments. It can also return the exact matched text. It also detects copyright statements and collects license metadata from package manifests (e.g Maven, Pypi, etc.). It output results in JSON, HTML or SPDX format. Written in Python. === apache-rat === [[https://github.com/apache/creadur-rat/| Apache Creadur rat]] is a "tool to improve accuracy and efficiency when checking releases." . Its goal is to help Apache Foundation projects to comply with the release policy including detecting licenses. Its approach is to use a key sentences dataset (small: ~20). Written in Java. === Other tools that need further detailing and review === * [[https://github.com/daald/dpkg-licenses|daald/dpkg-licenses]] "A command line tool which lists the licenses of all installed packages in a Debian-based system (like Ubuntu)". Wriiten in Shell script. * [[https://github.com/mwittig/npm-license-crawler|mwittig/npm-license-crawler]] "Analyzes license information for multiple node.js modules (package.json files) as part of your software project". Written in JavaScript. * [[https://github.com/fossology/atarashi|fossology/atarashi]] "Atarashi scans for license statements in open source software, focusing on text statistics. Designed to work stand-alone and with FOSSology". Written in Python. * [[https://git.fsfe.org/reuse/tool|FSFE's reuse/tool]] "The tool for checking and helping with compliance with the REUSE Initiative recommendations https://reuse.software ". Written in Python. * [[https://github.com/codeauroraforum/lid|codeauroraforum/lid]] "License Identifier. The purpose of this program, license_identifier, is to scan the source code files and identify the license text region and the type of license.". Written in Python. * [[https://github.com/heremaps/oss-review-toolkit|heremaps/oss-review-toolkit]] "A suite of tools to assist with reviewing Open Source Software dependencies. http://oss-review-toolkit.org/ " . Written in Kotlin. * [[https://github.com/google/licenseclassifier|google/licenseclassifier]] "A License Classifier". Written in Go. * [[https://github.com/google/licensecheck|google/licensecheck]] "The licensecheck package classifies license files and heuristically determines how well they correspond to known open source licenses". Written in Go. * [[https://github.com/google/go-licenses|google/go-licenses]] "Reports on the licenses used by a Go package and its dependencies". Written in Go. * [[https://github.com/jfrog/go-license-discovery|jfrog/go-license-discovery]] "A go library for matching text against known OSS licenses". Written in Go. * [[https://github.com/src-d/go-license-detector|src-d/go-license-detector]] "Reliable project licenses detector." Written in Go. Less active since source-d closed shop. * [[https://github.com/amzn/askalono|amzn/askalono]] "A tool & library to detect open source licenses from texts https://amzn.to/askalono". Written in Rust. * [[https://github.com/boyter/lc|boyter/lc]] "licensechecker (lc) a command line application which scans directories and identifies what software license things are under producing reports as either SPDX, CSV, JSON, XLSX or CLI Tabular output". Written in Go. * [[https://github.com/nexB/debut|nexB/debut]] "A python library to parse Debian deb822-style control and copyright files". Written in Python. == Applications == === fossology === [[https://www.fossology.org/|FOSSology]] is a open source license compliance software system and toolkit that [[https://debconf16.debconf.org/talks/100/|can]] (in version 3.1) generate DEP5 copyright files. Its approach is to detect licenses with a either large (large:~2500 regexes) dataset of regex patterns (nomos) or a full string comparison against license full texts (large: ~400 text) (monk). It also detects copyright statements and does also integrate with Ninka (see below). This is a complete database-backed web application with some command line support written in C/C++ with a PHP frontend. == Obsolete code == === OSLC === [[https://forge.ow2.org/projects/oslcv3/|OSLCv3]] Open Source License Checker 3.0 is a "risk management tool for analyzing open source software licenses." It detects licenses using key sentences and diffs using a dataset of license texts (small: ~50). It is developed in Java and seems no longer under development since 2009. === ninka === [[https://github.com/dmgerman/ninka|Ninka]] is a "license identification tool for Source Code". Its approach is to detect licenses from text sentences using a dataset of key license sentences (large: ~600) and assemble the results based on the matched sentences. It output results in CSV format. Written in Perl. Unmaintained since 2017. === jninka === [[https://github.com/whitesource/jninka/|jninka]] is a port from Perl to Java of `ninka`. Written in Java. Unmaintained/retired project. === slic === [[https://github.com/gerv/slic|gerv/slic]] "Speedy LIcense Checker and associated tools". Written in Python. No longer maintained since the death of its author. === dlt === [[https://github.com/agustinhenze/dlt/|dlt]] has support for parsing and creating Debian machine readable copyright files. Written in Python. Unmaintained/retired project. == See also == * [[https://github.com/dod38fr/config-model/wiki/Updating-debian-copyright-file-with-cme|Updating debian copyright file with cme]] by Dominique Dumont * [[http://people.skolelinux.org/pere/blog/Creating__updating_and_checking_debian_copyright_semi_automatically.html|Creating, updating and checking debian/copyright semi-automatically]] by Petter Reinholdtsen * Bachelor Thesis: [[https://osr.cs.fau.de/2019/08/07/final-thesis-a-comparison-study-of-open-source-license-crawler/|A Comparison Study of Open Source License Crawlers]] ([[https://osr.cs.fau.de/wp-content/uploads/2019/08/wolter_2019.pdf|PDF]]) by Thomas Wolter * [[CopyrightReview|Peer review for copyright files]] * [[https://github.com/maxhbr/LicenseScannerComparison]] A comparison of license scanners. * [[https://osr.cs.fau.de/2019/08/07/final-thesis-a-comparison-study-of-open-source-license-crawler/]] and [[https://web.archive.org/web/20200128142101/https://osr.cs.fau.de/wp-content/uploads/2019/08/wolter_2019.pdf]] A comparison of license scanners. * [[https://clearlydefined.io/| ClearlyDefined]] Massive license scanning (with scancode) and peer review for license clarity and correctness.