This is a proposal to make debian/copyright machine-interpretable. This file is one of the most important files in Debian packaging, yet its existing format is vague and varies tremendously across packages, making it difficult to automatically parse.

This is not a proposal to change the policy in the short term.

?TableOfContents

Rationale

The diversity of free software licenses means that Debian needs to care not only about the freeness of a given work, but also its license's compatibility with the other parts of Debian it uses.

The arrival of the GPL version 3, its incompatibility with version 2, and our inability to spot the software where the incompatibility might be problematic is one prominent occurrence of this limitation.

There are earlier precedents, also. One is the GPL/OpenSSL incompatibility. Apart from grepping debian/copyright, which is prone to numerous false positives (packaging under the GPL but software under another license) or negatives (GPL software but with an "OpenSSL special exception" dual licensing form), there is no reliable way to know which software in Debian might be problematic.

And there is more to come. There are issues with shipping GPLv2-only software with a CDDL operating system such as Nexenta. The GPL version 3 solves this issue, but not all GPL software can switch to it and we have no way to know how much of Debian should be stripped from such a system.

Compatibility and Human-Readability

The file must be encoded as UTF-8 and strictly formatted as a superset of RFC2822 including significant newlines. Free-form text is not allowed.

The debian/copyright file must be machine-interpretable, yet human-readable, while communicating all mandated upstream information, copyright notices and licensing details.

For the sake of human-readability this proposal avoids any complex field names or syntax rules.

Lintian

You can discuss implementation details in [http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=478930 bug 478930] -- ?MathieuParent

Implementation

Sections

Header Section (Once)

The header should be rfc2822 compliant, consisting of multiple fields.

I am not convinced that Upstream-Vcs-Browser or Upstream-Vcs-URI belong in debian/copyright. Perhaps debian/READEME.source would be a better place for this. In either case, this is way too complex as it stands. What purpose does it serve? Can we just stick with vanila URIs please? -- Noah Slater

It is machine readable data, similar to Upstream-Source. MIME-like tags are easy for machine processing. -- JariAalto

Jari, the question (as I understand it) is not about the *format* of this proposed data, but rather whether it belongs in debian/copyright at all, rather than, say, debian/README.source. I'm in agreement with Noah on this: it doesn't seem debian/copyright is the right place for this. —BenFinney

Correct. I was (perhaps confusingly) picking up on two issues here. The proposal for Upstream-Vcs-URI as it stands is way too complex for any use case I can imagine. Who would care about the "priority=50" value, for example? Being machine readable alone is not justification for it's inclusion in any part of the Debian packaging. The second issue I see is that I think this belongs in debian/README.source. -- ?NoahSlater

Well, the debian/copyright file is shown in the packages.debian.org, whereas the content of debian/README.source is not. I would see the Upstream-Source field being analogous to Upstream-Vcs-URI only differing in scheme of more modern access method. The upstream VCS information and download location is tightly coupled and it would be beneficial to keep them at the same place. The fields were only examples of how the information is extendable if presented using the MIME notation (most likely there only will be need for type and uri). -- JariAalto

Example:

Format-Specification:
    http://wiki.debian.org/Proposals/CopyrightFormat?action=recall&rev=196
Upstream-Name: SOFTware
Upstream-Maintainer: John Soft <some.email@example.com>
Upstream-Source: http://sourceforge.net/projects/software/

Files Section (Repeatable)

The declaration of copyright and license for files is done in one or more stanzas, formatted as RFC2822-style fields. Each stanza is separated from others by a blank line.

There is currently a [http://lists.debian.org/debian-devel-announce/2006/03/msg00023.html rather definite request (requirement?)] that copyright files include not just an assertion that the code is under a particular licence but to include the text that states that this is the case (e.g. the standard GPL statement used by many "This program is free software ... You should have received ..."). It appears that this is inconsistent with the the currently proposed format. Are the ftp-masters happy with the format as described here and what amounts to the removal of this statement from the copyright file? Or is it envisaged that the text after the "License:" include this statement? If this is the case, then some clarification is required and the examples should do this. -- StuartPrescott

That email is over two years old now. I can only assume, given the large number of people and packages using this copyright format as it stands, that the FTP Masters are happy with it to date. -- ?NoahSlater

Example:

Files: *
Copyright: Copyright 2008, Sam Ruby <rubys@intertwingly.net>
Copyright: © 2007, Scott James Remnant <scott@netsplit.com>
Copyright: Copyright 1998, 2000, 2002-2008 John William Aaron van der Smith
 <john.william.van.der.smith@somereally.reallylong.example.com>
License: PSF-2
 [LICENSE TEXT]

Standalone License Section

Where a set of files are dual (tri, etc) licensed you must use a single line License field and use standlone License fields to expand the license keywords.

Files: src/js/editline/*
Copyright: Copyright 1993, Simmule Turner
Copyright: Copyright 1993, Rich Salz
License: MPL-1.1 | GPL-2 | LGPL-2.1

License: MPL-1.1
 [LICENSE TEXT]

License: GPL-2
 [LICENSE TEXT]

License: LGPL-2.1
 [LICENSE TEXT]

Where multiple sets of files use the same license you can avoid repetition by using a single line License field and use a separate standalone License field to expand the license keyword.

Files: src/js/editline/*
Copyright: Copyright 1993, Simmule Turner
Copyright: Copyright 1993, Rich Salz
License: MPL-1.1

Files: src/js/fdlibm/*
Copyright: Copyright 1993, Sun Microsystems Corporation
License: MPL-1.1

License: MPL-1.1
 [LICENSE TEXT]

License Aliases

If a common type of license is a combination of multiple licenses (like the perl license), an alias can be made, so that it can be clear that it's the particular combination of licenses and not just any combination.

Files: *
License: Perl

License-Alias: Perl
Licenses: GPL-1+ | Artistic

License: GPL-1+
 [LICENSE TEXT]

License: Artistic
 [LICENSE TEXT]

Fields Detail

Files

Format

The value of the Files field should be a list of comma-separated values:

Files: foo.c, bar.*, baz.[ch]

File names containing spaces or commas should be put within double quotes. The backslash character is an escaping character, be it inside or outside double quotes:

Files: "Program Files/*", manual\[english\].txt

Syntax

Patterns are the ones recognised by the find utility's -name and -wholename flags. They behave as if find had been called in the following way from the top source directory:

find . -wholename "$PATTERN"

This will match all Makefile.am files in the tree and all Python scripts:

Files: */Makefile.am, *.py

But this will only match the top-level Makefile.am:

Files: ./Makefile.am

Special rule: if a pattern $PATTERN does not match any file in the source, it is implicitly considered to be expanded to */$PATTERN. This is to avoid insane verbosity when referring to a unique file buried deep in the tree.

Match Order

It is quite common for a work to have files with copyright held by different parties and received under different licenses. To allow this, multiple stanzas are allowed with different Files declarations.

However it makes for easier reading if the copyright file lists the "main" license first: the one matching the "top level" of the work, with others listed as exceptions. To allow this, the following precedence rule applies for matching files: If multiple Files declarations match the same file, then only the last match counts.

As a result, it is recommended for clarity that the stanzas appear in order from most general (e.g. Files: *) first, through to most specific. In the following example, the file getopt.c matches both Files: * and Files: getopt.*; only the last match counts, so the file getopt.c has the license declaration License: other-BSD.

Files: *
Copyright: Copyright 2003-2005, Fred J. Bloggs <fred@example.com>
License: [the main work's license]
 [LICENSE TEXT]

Files: getopt.*
Copyright: Copyright 2000, the NetBSD Foundation, Inc.
License: other-BSD
 [LICENSE TEXT]

Files: debian/*
Copyright: Copyright [years], [the debian package copyright holder]
License: [the debian package license]
 [LICENSE TEXT]

It is very common for the Debian packaging work to have a different copyright holder and/or license from the upstream work. In these cases, it is important that the debian/* pattern is placed after any other conflicting patterns.

License

Keywords

The "License" field, to be machine-parseable, should not contain arbitrary values. There needs to be a list of accepted keywords which have a very specific, unambiguous meaning. Here is a non-exhaustive list, please help fill it with popular license names we're likely to meet in Debian:

keyword

meaning

GPL-any

GNU General Public License, author did not specify version ?BR (probably the same as GPL-1+)

GPL-1

GNU General Public License, version 1 only

GPL-1+

GNU General Public License, version 1 or later ?BR (probably the same as GPL-any)

GPL-2

GNU General Public License, version 2 only

GPL-2+

GNU General Public License, version 2 or later

GPL-3

GNU General Public License, version 3 only

GPL-3+

GNU General Public License, version 3 or later

LGPL-any

GNU Library/Lesser General Public License, author did not specify version

LGPL-2

GNU Library General Public License, version 2 only

LGPL-2+

GNU Library General Public License, version 2 or later

LGPL-2.1

GNU Lesser General Public License, version 2.1 only

LGPL-2.1+

GNU Lesser General Public License, version 2.1 or later

LGPL-3

GNU Lesser General Public License, version 3 only

LGPL-3+

GNU Lesser General Public License, version 3 or later

PSF

Python License, author did not specify version

PSF-2

Python License, version 2 only

GFDL-any

GNU Free Documentation License, author did not specify version ?BR (maybe this needs mention of the fact that we accept no invariant sections, etc.)

GFDL-1.1

GNU Free Documentation License, version 1.1 only ?BR (same note as above)

GFDL-1.1+

GNU Free Documentation License, version 1.1 or newer ?BR (same note as above)

GFDL-1.2

GNU Free Documentation License, version 1.2 only ?BR (same note as above)

GFDL-1.2+

GNU Free Documentation License, version 1.2 or newer ?BR (same note as above)

GAP

GNU All-Permissive license, http://www.gnu.org/prep/maintain/maintain.html#License-Notices-for-Other-Files

BSD-2

Two-clause BSD license

BSD-3

Three-clause BSD license, with no-endorsement clause, as seen in /usr/share/common-licenses/BSD?BR

BSD-4

Four-clause BSD license, with no-endorsement clause and advertising clause; GPL-incompatible (need exact text)

The convention for license abbreviations seems to be settling on XYZ for a license with only one version, and XYZ-n for *sequential version n* of the XYZ license, where XYZ-3 is chronologically earlier than XYZ-4 and later than XYZ-2. I disagree that we should break this semantic for the BSD-style licenses, where the number usually discussed is the *number of clauses* and not the sequential version number; indeed, these licenses have no recognised version number.

Perhaps the abbreviation should deliberately show the number of clauses in a way that doesn't indicate a version number: BSD-3C. But that still sounds vaguely like a version number; it still suggests sequence in the different versions that doesn't actually match the true chronology; it also fails to indicate *which* clauses are included.

In the absence of a clear succession of differently-numbered consecutive versions of a license text, my proposal is: we could come up with abbreviations similar to those used for indicating active clauses in the Creative Commons licenses:

This way, no false impression of chronological sequence is implied, and the abbreviation provides a mnemonic for what the terms of the license actually are, not just the number of clauses in them. -- BenFinney 2008-10-15

What about just using BSD2, BSD3 and BSD4? -- ?NoahSlater 2008-10-15

I agree with Ben: using numbering schemes for BSD should be discouraged, because they are understood as revisions or updates, which is not the case. While abbreviations like BSD-BY-LC-NE-AD may sound practical, I find them lacking recognition. The BSD licenses have been examined by FSF, so perhaps we could use names and definitions used at [http://www.gnu.org/philosophy/license-list.html#FreeBSD License list] —JariAalto 2008-11-02:

keyword

GPL compatible

meaning

[http://www.gnu.org/philosophy/license-list.html#OriginalBSD BSD]

No

Also known as the “Original 4-clause BSD license”. Contains the “obnoxious BSD advertising clause”.

[http://www.gnu.org/philosophy/license-list.html#ModifiedBSD ModifiedBSD]

Yes

This is the original BSD license, modified by removal of the advertising clause. It is a simple, permissive non-copyleft free software license.

[http://www.gnu.org/philosophy/license-list.html#FreeBSD FreeBSD]

Yes

Also known as the “2-clause BSD license”. Original BSD license with the advertising clause and another clause removed. Simple, permissive non-copyleft free software license.

[http://www.gnu.org/philosophy/license-list.html#ISC OpenBSD]

Yes

Also known by name "ISC License". This license does have an unfortunate wording choice.

[http://www.gnu.org/philosophy/license-list.html#clearbsd ClearBSD]

Yes

Based on the modified BSD license, and adds a term expressly stating it does not grant you any patent licenses.

BSD-other

...

Any other custom license, which is not one of the above (need exact text).

Apache-1.0

Apache license, version 1.0; not GPL-compatible

Apache-1.1

Apache license, version 1.1; not GPL-compatible

Apache-2.0

Apache license, version 2.0; GPL-3-compatible, not GPL-2-compatible

MPL-1.1

Mozilla Public License, version 1.1 only, http://www.mozilla.org/MPL/MPL-1.1.html

Artistic

The original Artistic license, as seen in /usr/share/common-licenses/Artistic

Artistic-2.0

The Artistic license, version 2.0, http://www.perlfoundation.org/artistic_license_2_0

LPPL-1.3a

The LaTeX Project Public License, version 1.3a, http://www.latex-project.org/lppl/lppl-1-3a.txt; GPL-incompatible?BRNote that works under any version of the LPPL often have additional restrictions attached; check carefully.

ZPL

Zope Public License, author did not specify version

ZPL-2.1

Zope Public License, version 2.1 only

EPL-1.1

Erlang Public License, version 1.1 only

EFL-2

Eiffel Forum License, version 2 only

CPL

IBM Common Public License

CC-BY-3

Creative Commons Attribution License (Unported), version 3.0 only

CC-BY-SA-3

Creative Commons Attribution-?ShareAlike License (Unported), version 3.0 only

ZLIB

The zlib/libpng license as in http://www.opensource.org/licenses/zlib-license.php

Expat

The terms of the Expat license, http://www.jclark.com/xml/copying.txt ?BR This license is what many people mean by "the MIT license", but that term is too ambiguous as there is more than one "MIT license" in the wild

ISC

The Internet Software Consortium's “ISC license”, http://opensource.org/licenses/isc-license.txt

W3C-Software

The W3C Software License, http://www.w3.org/Consortium/Legal/2002/copyright-software-20021231

CeCILL-1

CEA-CNRS-INRIA-Logiciel Libre, version 1, http://www.cecill.info/licences/Licence_CeCILL_V1.1-US.html

CeCILL-2

CEA-CNRS-INRIA-Logiciel Libre, version 2, http://www.cecill.info/licences/Licence_CeCILL_V2-en.html

CeCILL-B

CEA-CNRS-INRIA-Logiciel Libre B, http://www.cecill.info/licences/Licence_CeCILL-B_V1-en.html

CeCILL-C

CEA-CNRS-INRIA-Logiciel Libre C, http://www.cecill.info/licences/Licence_CeCILL-C_V1-en.html

WTFPL-2

Do What The Fuck You Want To Public License, version 2, http://sam.zoy.org/wtfpl/COPYING

...

add your favourite license here

other

Anything else not covered in this list, should be clarified in the following lines of the field

Can we have other-non-free and other-gpl[2|3]-[in]compatible? The latter would help to automate a GPL-compatibility check. The former could help with mixed free/non-free source packages -- the whole package would of course go in non-free, but it could be worth noting for a later effort to separate out the non-free parts. -- ?AdamPowell

Stuff that we might want but that needs to be clarified:

other-BSD

a BSD-like license ?BR (not sure it's wise to have this keyword, especially since it might be GPL-incompatible; if in doubt, let's stick with "other")

MIT

Several variants of the MIT license exist: the standard version with three paragraphs (blanket permission, keep this notice, NO WARRANTY), a version with a no-endorsement clause, and other versions with slight wording differences.

MIT-any

When the work is licensed under an unspecified MIT style license.

PD

In the public domain, not applicable everywhere

Syntax

License names are case-insensitive.

The value of the field should follow the syntax of debian/control's Depends field. The pipe character "|" is used for code that can be used under the terms of either licenses. The comma "," is used for code that must be used under the terms of both licenses (for rare cases where a single file contains code under both licenses).

For instance, this is a simple, "GPL version 2 or later" field:

License: GPL-2+

This is a dual-licensed GPL/Artistic work such as Perl:

License: GPL-1+ | Artistic

This is for a file that has both GPL and classic BSD code in it:

License: GPL-any, BSD-3

And this is for a file that has Perl code and classic BSD code in it:

License: GPL-1+ | Artistic, BSD-3

A GPL-2+ work with the OpenSSL exception is in effect a dual-licensed work that can be redistributed either under the GPL-2+, or under the GPL-2+ with the OpenSSL exception. It is thus expressed as "GPL-2+ | other":

License: GPL-2+ | other
 In addition, as a special exception, the author of this program gives
 permission to link the code of its release with the OpenSSL project's
 "OpenSSL" library (or with modified versions of it that use the same
 license as the "OpenSSL" library), and distribute the linked executables.
 You must obey the GNU General Public License in all respects for all of
 the code used other than "OpenSSL".  If you modify this file, you may
 extend this exception to your version of the file, but you are not
 obligated to do so.  If you do not wish to do so, delete this exception
 statement from your version."

Examples

Simple

A possible copyright file for xsol:

Format-Specification: http://wiki.debian.org/Proposals/CopyrightFormat?action=recall&rev=143
Upstream-Name: X Solitaire
Upstream-Source: ftp://sunsite.unc.edu/pub/Linux/X11/games/

Files: *
Copyright: Copyright 1998, Brian Masney <masneyb@newwave.net>
License: GPL-2+
 On Debian systems the full text of the GNU General Public License can be found
 in the `/usr/share/common-licenses/GPL' file.

Files: debian/*
Copyright: Copyright 1998, Josip Rodin <jrodin@jagor.srce.hr>
License: other
 [LICENSE TEXT]

Complex

A possible copyright file for planet-venus:

Format-Specification: http://wiki.debian.org/Proposals/CopyrightFormat?action=recall&rev=178
Upstream-Name: Planet Venus
Upstream-Maintainer: Sam Ruby <rubys@intertwingly.net>
Upstream-Source: http://www.intertwingly.net/code/venus/

Files: *
Copyright: Copyright 2008, Sam Ruby <rubys@intertwingly.net>
Copyright: Copyright 2007, Scott James Remnant <scott@netsplit.com>
Copyright: Copyright 2007, Jeff Waugh <jdub@perkypants.org>
Copyright: Copyright 2007, Eric van der Vlist <vdv@dyomedea.com>
License: PSF-2
 [LICENSE TEXT]

Files: debian/*
Copyright: Copyright 2008, Noah Slater <nslater@bytesexual.org>
License: GAP
 Copying and distribution of this package, with or without modification, are
 permitted in any medium without royalty provided the copyright notice and this
 notice are preserved.

Files: debian/patches/theme-diveintomark.patch
Copyright: Copyright 2008, Mark Pilgrim <mark@diveintomark.org>
License: MIT
 [LICENSE TEXT]

Files: planet/vendor/compat_logging/*
Copyright: Copyright 2002, Vinay Sajip <vinay_sajip@yahoo.co.uk>
License: MIT
 [LICENSE TEXT]

Files: planet/vendor/feedparser.py
Copyright: Copyright 2007, Mark Pilgrim <mark@diveintomark.org>
License: MIT
 [LICENSE TEXT]

Files: planet/vendor/httplib2/*
Copyright: Copyright 2006, Joe Gregorio <joe@bitworking.org>
License: MIT-any
 Unspecified MIT style license.

Files: planet/vendor/htmltmpl.py
Copyright: Copyright 2004, Tomas Styblo <tripie@cpan.org>
License: GPL-any
 On Debian systems the full text of the GNU General Public License can be found
 in the `/usr/share/common-licenses/GPL' file.

Files: planet/vendor/timeoutsocket.py
Copyright: Copyright 2001, Timothy O'Malley <timo@alum.mit.edu>
License: MIT
 [LICENSE TEXT]

Questions

Recent Changes