Comment: I have been considering that we may need to alter how we develop this standard. Editing on a wiki, with inline comments is pretty frustrating. It becomes troublesome when people make large changes without much discussion beforehand. What do people think to moving or locking this place so that it cannot be arbitrarily edited and starting a mailing list for discussion, so that we can reach consensus about changes before making them. I would like to have an threadable and searchable discussion archive for this. -- ?NoahSlater

Comment: I think that the proposal got developed and popularised enough that a [http://dep.debian.net/deps/dep0/ DEP] could be proposed (but after Lenny release, please!). The discussion of the DEP will be more efficient to resolve out the remaining issues that are debated now using the wiki as a medium.-- CharlesPlessy

Comment: This sounds like a fine idea. I think that the wiki is starting to hinder, rather than promote, the development of this standard. Let's wait for Lenny and move forward from there. -- ?NoahSlater

This is a proposal to make debian/copyright machine-interpretable. This file is one of the most important files in Debian packaging, yet its existing format is vague and varies tremendously across packages, making it difficult to automatically parse.

This is not a proposal to change the policy in the short term.

?TableOfContents

Rationale

The diversity of free software licenses means that Debian needs to care not only about the freeness of a given work, but also its license's compatibility with the other parts of Debian it uses.

The arrival of the GPL version 3, its incompatibility with version 2, and our inability to spot the software where the incompatibility might be problematic is one prominent occurrence of this limitation.

There are earlier precedents, also. One is the GPL/OpenSSL incompatibility. Apart from grepping debian/copyright, which is prone to numerous false positives (packaging under the GPL but software under another license) or negatives (GPL software but with an "OpenSSL special exception" dual licensing form), there is no reliable way to know which software in Debian might be problematic.

And there is more to come. There are issues with shipping GPLv2-only software with a CDDL operating system such as Nexenta. The GPL version 3 solves this issue, but not all GPL software can switch to it and we have no way to know how much of Debian should be stripped from such a system.

Comment: apparently, [http://fedoraproject.org/wiki/Licensing Fedora started a very similar project]: on of my upstreams (the Samba Team) pointed me to the very few differences in the way to name licenses, particularly the short-form and the method to combine licenses. —ChristianPerrier

Compatibility and Human-Readability

The file must be encoded as UTF-8 and strictly formatted as a superset of RFC2822 including significant newlines. Free-form text is not allowed.

The debian/copyright file must be machine-interpretable, yet human-readable, while communicating all mandated upstream information, copyright notices and licensing details.

For the sake of human-readability this proposal avoids any complex field names or syntax rules.

Lintian

You can discuss implementation details in [http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=478930 bug 478930] -- ?MathieuParent

Implementation

Sections

Header Section (Once)

The header should be rfc2822 compliant, consisting of multiple fields.

Examples:

Format-Specification:
    http://wiki.debian.org/Proposals/CopyrightFormat?action=recall&rev=196
Upstream-Name: SOFTware
Upstream-Maintainer: John Doe <john.doe@example.com>
Upstream-Source: http://www.example.com/software/project
Upstream-Vcs-URI: type=git; uri=http://git.example.com/project.git

Format-Specification:
    http://wiki.debian.org/Proposals/CopyrightFormat?action=recall&rev=196
Upstream-Name: xyz
Upstream-Maintainer: Jane Smith <jane.smith@example.com>
Upstream-Vcs-Browser: http://www.example.com/gitwww
Upstream-Vcs-URI: type=git; uri=http://git.example.com/xyz.git

Files Section (Repeatable)

The declaration of copyright and license for files is done in one or more stanzas, formatted as RFC2822-style fields. Each stanza is separated from others by a blank line.

Example:

Files: *
Copyright: Copyright 2008, John Doe <john.doe@example.com>
Copyright: Copyright 2007, Jane Smith <jane.smith@example.com>
License: PSF-2
 [LICENSE TEXT]

Discussion

Comment: There is currently a [http://lists.debian.org/debian-devel-announce/2006/03/msg00023.html rather definite request (requirement?)] that copyright files include not just an assertion that the code is under a particular licence but to include the text that states that this is the case (e.g. the standard GPL statement used by many "This program is free software ... You should have received ..."). It appears that this is inconsistent with the the currently proposed format. Are the ftp-masters happy with the format as described here and what amounts to the removal of this statement from the copyright file? Or is it envisaged that the text after the "License:" include this statement? If this is the case, then some clarification is required and the examples should do this. -- StuartPrescott

Comment: That email is over two years old now. I can only assume, given the large number of people and packages using this copyright format as it stands, that the FTP Masters are happy with it to date. -- ?NoahSlater

Comment: There initial proposal suggested Copyright: <"Copyright" | "©" | "Copr."> .... The Header field itself expresses this, so a simple format suffices:  <list of years> "," <Firstname> <Lastname> ["<" email ">"]. Note that "(C)" is legally null; the only recognized abbreviations for word Copyright are "Copr." (possibly valid only in US) and "©".

Comment: I really wish you would request comments before making changes like this, I am getting rather frustrated. The Copyright field should include whatever copyright statement the original copyright holders have chosen, they certainly do not need to provide a personal name, let alone in the rather ridged and internationally insensitive "<Firstname> <Lastname>" format. The value of this field is to be reproduced verbatim so it needs to include the word "Copyright" or similar, even if this looks to be duplication to you.

Comment: In [http://www.gnu.org/licenses/gpl-howto.html FSF's instructions] abbreviations are discouraged practise. Please add link to clarify legal status of "Copr." etc. -- JariAalto

Standalone License Section

Where a set of files are dual (tri, etc) licensed you must use a single line License field and use standlone License fields to expand the license keywords.

Files: src/js/editline/*
Copyright: Copyright 1993, John Doe
Copyright: Copyright 1993, Joe Average
License: MPL-1.1 | GPL-2 | LGPL-2.1

License: MPL-1.1
 [LICENSE TEXT]

License: GPL-2
 [LICENSE TEXT]

License: LGPL-2.1
 [LICENSE TEXT]

Where multiple sets of files use the same license you can avoid repetition by using a single line License field and use a separate standalone License field to expand the license keyword.

Files: src/js/editline/*
Copyright: Copyright 1993, John Doe
Copyright: Copyright 1993, Joe Average
License: MPL-1.1

Files: src/js/fdlibm/*
Copyright: Copyright 1993, J-Random Comporation
License: MPL-1.1

License: MPL-1.1
 [LICENSE TEXT]

License Aliases

If a common type of license is a combination of multiple licenses (like the perl license), an alias can be made, so that it can be clear that it's the particular combination of licenses and not just any combination.

Files: *
License: Perl

License-Alias: Perl
Licenses: GPL-1+ | Artistic

License: GPL-1+
 [LICENSE TEXT]

License: Artistic
 [LICENSE TEXT]

Discussion

Comment: This seems overly complex to me, lets keep it simple. Using "GPL-1+ | Artistic" for Perl licensing is fine and should cause no confusion. Adding an extra field to save 13 characters on the odd occasion that the Perl style licensing is chosen is a bad decision. -- ?NoahSlater

Comment: It doesn't have anything to do with saving characters, but to declare, that it's infact, per example, the "perl" license. This could be extended to inlcude the exact copyright note that the upstream had placed while delegating it to two licenses.

I did though of other names, "Multi-License", "Merged-License" etc... but though "License-Alias" was sufficient. -- ?AzaToth

Comment: I disagree, this isn't the "Perl license" - it's the combination of licenses that Perl uses. Just like the Mozilla tri-licensed MPL|GPL|LGPL stuff, I don't see any reason why this needs to be given a distinct name. -- ?NoahSlater

Fields Detail

Files

Format

The value of the Files field should be a list of comma-separated values:

Files: foo.c, bar.*, baz.[ch]

File names containing spaces or commas should be put within double quotes. The backslash character is an escaping character, be it inside or outside double quotes:

Files: "Program Files/*", manual\[english\].txt

Syntax

Patterns are the ones recognised by the find utility's -name and -wholename flags. They behave as if find had been called in the following way from the top source directory:

find . -wholename "$PATTERN"

This will match all Makefile.am files in the tree and all Python scripts:

Files: */Makefile.am, *.py

But this will only match the top-level Makefile.am:

Files: ./Makefile.am

Special rule: if a pattern $PATTERN does not match any file in the source, it is implicitly considered to be expanded to */$PATTERN. This is to avoid insane verbosity when referring to a unique file buried deep in the tree.

Match Order

It is quite common for a work to have files with copyright held by different parties and received under different licenses. To allow this, multiple stanzas are allowed with different Files declarations.

However it makes for easier reading if the copyright file lists the "main" license first: the one matching the "top level" of the work, with others listed as exceptions. To allow this, the following precedence rule applies for matching files: If multiple Files declarations match the same file, then only the last match counts.

As a result, it is recommended for clarity that the stanzas appear in order from most general (e.g. Files: *) first, through to most specific. In the following example, the file getopt.c matches both Files: * and Files: getopt.*; only the last match counts, so the file getopt.c has the license declaration License: other-BSD.

{{{contain arbitrary values Files: * Copyright: Copyright 2003-2005, John Doe <jdoe@xample.com> License: [the main work's license]

Files: getopt.* Copyright: Copyright 2000, The Corporation Foundation, Inc. License: other-BSD

Files: debian/* Copyright: Copyright [years], [the debian package copyright holder] License: [the debian package license]

}}}

It is very common for the Debian packaging work to have a different copyright holder and/or license from the upstream work. In these cases, it is important that the debian/* pattern is placed after any other conflicting patterns.

License

Keywords

The "License" field, to be machine-parseable, should not contain arbitrary values. There needs to be a list of accepted keywords which have a very specific, unambiguous meaning. The convention for license abbreviations keyword is XYZ for a license with only one version, and XYZ-n for *sequential version n* of the XYZ license, where XYZ-1 is chronologically earlier than XYZ-2. The syntax of the License keyword can be defined using similar to [http://en.wikipedia.org/wiki/Extended_Backus–Naur_form EBNF grammar]:

    License ::= <keyword>[<version>]["BY-" {N* <clarification>}]

    <keyword> ::= 
        <License keyword FSF>
        | <License keyword BSD>
        | <License keyword well known>
        | "other"
 
    <License keyword FSF> ::=
        GPL | LGPL | AGPL | GFDL

    <License keyword BSD> ::=
        ... to be decided, see table below for proposals.

    <License keyword well known> ::=
        ... to be decided, see table below for proposals.

    <version>          ::= <License version>[+]
    <License version>  ::= <Numeric version> | <Other version>
    <Numeric version>  ::= [0-9.]+
    <Other version>    ::= <Vendor's version> | <publication date>
    <Vendor's version> ::= string (* anything: "A", "B", "public" *)
    <publication date> ::= YYYYMMDD

    <clarification> ::=
        <clarification GPL>
        <clarification GFDL>
        | <clarification CC>
        | <clarification other>

    <clarification GPL> ::=
        "-CC" (* adds the Creative Commons' metadata and Commons Deed to GPL *)

    <clarification GFDL> ::=
        "-NIV" (* With no invariant sections *)
        "-CC"  (* adds the Creative Commons' metadata and Commons Deed to GFDL *)

    <clarification CC> ::= (* Creative Commons License variants *)
        "-NC" (* No Commerercial *)
        "-ND" (* No Derivative Works *)
        "-SA" (* Share Alike *)

    <clarification other> ::= Other list of license specific keywords that
        clarify optional parts included or exluded.

Examples of the above ENBF:

keyword

meaning

Apache-1.0

Apache license, version 1.0 only

CC-3.0-BY-SA-ND-NC 

Creative Commons Attribution License 3.0; with Share Alike, No Derivs, No commercial

GPL-1+

GNU General Public License, version 1 or later

LGPL-2

GNU Lesser General Public License, version 2 only

LGPL-2.1+

GNU Lesser General Public License, version 2.1 or later

GFDL-1.2

GNU Free Documentation License, version 1.2 only

GFDL-1.2+-BY-NIV

GNU Free Documentation License, version 1.2 or later, with no invariant sections

PSF-2

Python Software License, version 2 only

other

Any other custom license. Text must be copied verbatim.

Discussion

Comment: I am opposed this suggestion. I think introducing an EBNF grammar for the license keywords is totally unnecessary. This seems like complexity for complexities sake. No one is suggesting that this field contain arbitrary values, the original proposal was that the values would be explicitly enumerated. There is no technical reason why this is not sufficient for our needs. In the edit to add this proposal you have completely obliterated the existing list of keywords that were already within use in Debian. -- ?NoahSlater

Comment: On the contrary. There is no need to list every possible case starting from GPL-1, GPL-1+, GPL-2, GPL-2+. LGPL-1...LGPL-2.1, LGPL-2.1+ etc. The grammar can sufficiently express how the license keywords are to be constructed. If the examples above are not sufficient, they can be extended as needed. The EBNF does not imply arbitrary values. What needs to be decided is the exact keywords that express the license names for slots "to be decided". -- JariAalto

Comment: You say "there is no need", but this doesn't really make sense. We have two suggestions so far, enumerating every single licence using, essentially opaque, identifiers, or using your solution and making the identifiers themselves meaningful, constructing a grammar to do so. Two solutions, and varying positive and negative points. I prefer the first, and original, solution of explicitly enumerating each license, I find the alternate to be overcomplex, and buys us little for the additional complexity it adds for both implementors of this standard and those trying to produce conforming documents.

Comment: The grammar can "enumerate" licenses because that's what EBNF does. This does does not exclude in any way that there could be a "whole list" if someone wished to write one with every possible combination. The idea is that when more and more licenses are added, there is a need to define syntax how the license keywords are to be constructed. -- JariAalto

Comment: We had already written a whole list that you have subsequently removed. This list was already in use by a large number of Debian packages. Could you please restore it? -- ?NoahSlater

Comment: That list is expressed in the grammar; the examples section demonstrates how to apply it. No license keywords should have been lost. For example you can read: GPL-1, GPL-1+, GPL-2, GPL-2+, GPL-3, GPL-3+ etc. which are all covered. -- JariAalto

<License keyword BSD>

Proposal 1:

Comment: These licenses have no recognised version number. Perhaps the abbreviation should deliberately show the number of clauses in a way that doesn't indicate a version number: BSD-C<N>. But that still sounds vaguely like a version number; it still suggests sequence in the different versions that doesn't actually match the true chronology; it also fails to indicate *which* clauses are included. —BenFinney 2008-10-16:

BSD-C2

Two-clause BSD license

BSD-C3

Three-clause BSD license, with no-endorsement clause, as seen in /usr/share/common-licenses/BSD?BR

BSD-C4

Four-clause BSD license, with no-endorsement clause and advertising clause; GPL-incompatible (need exact text)

Proposal 2:

In the absence of a clear succession of differently-numbered consecutive versions of a license text, my proposal is: we could come up with abbreviations similar to those used for indicating active clauses in the Creative Commons licenses. This way, no false impression of chronological sequence is implied, and the abbreviation provides a mnemonic for what the terms of the license actually are, not just the number of clauses in them. -- BenFinney 2008-10-15

BSD-BY-LC

forms requiring only the inclusion of copyright notice and condition

BSD-BY-LC-NE

plus “no endorsement without permission”

BSD-BY-LC-NE-AD

plus “advertising required”

Proposal 3:

While mnemonics like LC,NE,AD may sound practical (where does LC abbreviation come from?), in BSD case I find them lacking wider recognition. The BSD licenses have been examined by FSF, so perhaps we could use names and definitions used at [http://www.gnu.org/philosophy/license-list.html#FreeBSD License list] —JariAalto:

keyword

GPL compatible

meaning

[http://www.gnu.org/philosophy/license-list.html#OriginalBSD BSD]

No

Also known as the “[http://www.opensource.org/licenses/bsd-license.php Original 4-clause BSD license]”. Contains the “obnoxious BSD advertising clause”.

[http://www.gnu.org/philosophy/license-list.html#ModifiedBSD ModifiedBSD]

Yes

This is the original BSD license, modified by removal of the advertising clause. It is a simple, permissive non-copyleft free software license.

[http://www.gnu.org/philosophy/license-list.html#FreeBSD FreeBSD]

Yes

Also known as the “[http://www.freebsd.org/copyright/freebsd-license.html 2-clause BSD license]”. Original BSD license with the advertising clause and another clause removed. Simple, permissive non-copyleft free software license.

[http://www.gnu.org/philosophy/license-list.html#ISC OpenBSD]

Yes

Also known by name "[http://www.opensource.org/licenses/isc-license.txt ISC License]". This license does have an unfortunate wording choice.

[http://www.gnu.org/philosophy/license-list.html#clearbsd ClearBSD]

Yes

Based on the modified BSD license, and adds a term expressly stating it does not grant you any patent licenses.

<License keyword well known>

The basic of these keywords should be wide recognition. Something based on lists like [http://www.opensource.org/licenses/alphabetical Open Source Initiative: Licenses by Name], [http://en.wikipedia.org/wiki/List_of_FSF_approved_software_licences Wikipedia: List of FSF approved software licences], [http://www.gnu.org/philosophy/license-list.html FSF: Various Licenses and Comments about Them] and [http://www.opensource.org/licenses/alphabetical Open Source Initiative: Licenses by Name]

Comment: What do you mean by "wide recognition", I hardly see how this has any relevance. If we need to use a license, we need to use a license. -- ?NoahSlater

Comment: E.g. using label Apache to refer to particular type of license is widely recognized (cf. FSF, Open Source initiative). This list should include only licenses that are in wide use and refrain from listing all possible past and present licenses. The lesser known licenses can use category "other" with text copied verbatim. The list below is not yet complete. -- JariAalto

TODO: This list needs better scrutiny:

Comment: What do you mean better scrutiny? For what? -- ?NoahSlater

keyword

meaning

Apache

Apache license

Artistic

The Perl Artistic license. See e.g 2.0 at http://www.perlfoundation.org/artistic_license_2_0

CC

[http://creativecommons.org/license Creative Commons Attribution License]

IBMCPL

IBM Common Public License

CeCILL

CEA-CNRS-INRIA-Logiciel Libre. See http://www.cecill.info/licences

Eiffel

Eiffel Forum License

Erlang

Erlang Public License

Expat

The terms of the Expat license, http://www.jclark.com/xml/copying.txt ?BR This license is what many people mean by "the MIT license", but that term is too ambiguous as there is more than one "MIT license" in the wild

ISC

The Internet Software Consortium's “ISC license”. See http://opensource.org/licenses/isc-license.txt

LatexPPL

The LaTeX Project Public License. See e.g. http://www.latex-project.org/lppl/lppl-1-3a.txt; GPL-incompatible?BRNote that works under any version of the License often have additional restrictions attached; check carefully.

MPL

Mozilla Public License. See e.g. http://www.mozilla.org/MPL/MPL-1.1.html

PSF

Python License

PHP

PHP License

W3C-Software

The W3C Software License. See http://www.w3.org/Consortium/Legal/2002/copyright-software-20021231

ZLIB

The zlib/libpng license. See http://www.opensource.org/licenses/zlib-license.php

Zope

Zope Public License.

Things than need to be clarified

If author did not specify version and version cannot be decoded from the context or other files, how the license should be expressed?

MIT

Several variants of the MIT license exist: (1) the standard version with three paragraphs (blanket permission, keep this notice, NO WARRANTY), (2) a version with a no-endorsement clause, and (3) other versions with slight wording differences. Text needs to be copied verbatim

PD

[http://creativecommons.org/licenses/publicdomain/ In the public domain]. Not applicable everywhere in the world. Text needs to be copied verbatim

Comment: Can we have other-non-free and other-gpl[2|3]-[in]compatible? The latter would help to automate a GPL-compatibility check. The former could help with mixed free/non-free source packages -- the whole package would of course go in non-free, but it could be worth noting for a later effort to separate out the non-free parts. -- ?AdamPowell

Syntax

License names are case-insensitive.

The value of the field should follow the syntax of debian/control's Depends field. The pipe character "|" is used for code that can be used under the terms of either licenses. The comma "," is used for code that must be used under the terms of both licenses (for rare cases where a single file contains code under both licenses).

For instance, this is a simple, "GPL version 2 or later" field:

License: GPL-2+

This is a dual-licensed GPL/Artistic work such as Perl:

License: GPL-1+ | Artistic

This is for a file that has both GPL and classic BSD code in it:

License: GPL-1+, BSD-C2

And this is for a file that has Perl code and classic BSD code in it:

License: GPL-1+ | Artistic, BSD-C3

A GPL-2+ work with the OpenSSL exception is in effect a dual-licensed work that can be redistributed either under the GPL-2+, or under the GPL-2+ with the OpenSSL exception. It is thus expressed as "GPL-2+ | other":

License: GPL-2+ | other
 In addition, as a special exception, the author of this program gives
 permission to link the code of its release with the OpenSSL project's
 "OpenSSL" library (or with modified versions of it that use the same
 license as the "OpenSSL" library), and distribute the linked executables.
 You must obey the GNU General Public License in all respects for all of
 the code used other than "OpenSSL".  If you modify this file, you may
 extend this exception to your version of the file, but you are not
 obligated to do so.  If you do not wish to do so, delete this exception
 statement from your version."

Discussion

Comment: However, this description is still not what the author meant. If the field read License: GPL-2 | Artistic then the end-user has the choice of choosing to accept either the GPL-2 or Artistic licenses, but this is not the case and this "other" is not really a separate license as it is instead an exception that is added to notice putting the code under the GPL in the first place. There is no choice between "GPL-2+" and "other" as indicated by the use of the | between the license keywords. The problem stems from conflating the statement of what licenses the code is under with the licenses themselves.

Comment: Does it instead need to be written explicitly like this:

Examples

Simple

A possible copyright file for xsol:

Format-Specification: http://wiki.debian.org/Proposals/CopyrightFormat?action=recall&rev=143
Upstream-Name: X Solitaire
Upstream-Source: ftp://ftp.example.com/pub/games

Files: *
Copyright: Copyright 1998, John Doe <jdoe@example.com>
License: GPL-2+
 On Debian systems the full text of the GNU General Public License can be found
 in the `/usr/share/common-licenses/GPL' file.

Files: debian/*
Copyright: Copyright 1998, Jane Smith <jsmith@example.net>
License: other
 [LICENSE TEXT]

Complex

A possible copyright file for planet-venus:

Format-Specification: http://wiki.debian.org/Proposals/CopyrightFormat?action=recall&rev=178
Upstream-Name: Planet Venus
Upstream-Maintainer: John Doe <jdoe@example.com>
Upstream-Source: http://www.example.com/code/venus

Files: *
Copyright: Copyright 2008, John Doe <jdoe@example.com>
Copyright: Copyright 2007, Jane Smith <jsmith@example.org>
Copyright: Copyright 2007, Joe Average <joe@example.org>
Copyright: Copyright 2007, J. Random User <jr@users.example.com>
License: PSF-2
 [LICENSE TEXT]

Files: debian/*
Copyright: Copyright 2008, Dan Developer <dan@debian.example.com>
License: GAP
 Copying and distribution of this package, with or without modification, are
 permitted in any medium without royalty provided the copyright notice and this
 notice are preserved.

Files: debian/patches/theme-diveintomark.patch
Copyright: Copyright 2008, Joe Hacker <hack@example.org>
License: GPL-2+
 [LICENSE TEXT]

Files: planet/vendor/compat_logging/*
Copyright: Copyright 2002, Mark Smith <msmith@example.org>
License: MIT
 [LICENSE TEXT]

Files: planet/vendor/httplib2/*
Copyright: Copyright 2006, John Brown <brown@example.org>
License: other
 Unspecified MIT style license.
Files: planet/vendor/feedparser.py
Copyright: Copyright 2007, Mike Smith <mike@example.org>
License: PSF-2
 [LICENSE TEXT]

Files: planet/vendor/htmltmpl.py
Copyright: Copyright 2004, Thomas Brown <coder@example.org>
License: GPL-1+
 On Debian systems the full text of the GNU General Public License can be found
 in the `/usr/share/common-licenses/GPL' file.

Questions

Question: I am not quite sure this is the right place... Some licenses must be presented to the *user* that install the system (like sun-java, AFAIK), and forbid preseeding. Some other licenses simply must be presented(like [http://intellinuxwireless.org/?n=faq&s=license#license_1 intel ipw2200] license). Some other licenses should be presented and allow special case for automated installation. A filed like LicenseAcceptation could then be used in preinst/postinst script. FranklinPiat

Recent Changes