This page is about a proposal to make debian/copyright machine-interpretable. It is one of the most important files in Debian packaging, yet its format is vague and varies tremendously across packages, making it difficult to automatically parse.

This is not a proposal to change the policy in the short term.

?TableOfContents

Rationale

The diversity of free software licenses means that Debian does not only need to care about the freeness of a given work, but also its license's compatibility with the other parts of Debian it uses.

The arrival of the GPL version 3, its incompatibility with version 2, and our inability to spot the software where the incompatibility might be problematic is the most recent occurrence of this limitation.

There are a few precedents, also. One is the GPL/OpenSSL incompatibility. Apart from grepping debian/copyright, which is prone to numerous false positives (packaging under the GPL but software under another license) or negatives (GPL software but with an "OpenSSL special exception" dual licensing form), there is no reliable way to know which software in Debian might be problematic.

And there is more to come. There are issues with shipping GPLv2-only software with a CDDL operating system such as Nexenta. The GPL version 3 solves this issue, but not all GPL software can switch to it and we have no way to know how much of Debian should be stripped from such a system.

Proposal

I suggest to add simple RFC2822 multiline fields to debian/copyright containing machine-interpretable values for copyright holders, known licenses, upstream URLs etc. -- SamHocevar

These fields should be clear enough to obviate duplicating their information somewhere else in the file.

Compatiblity and human-readability

It is important to have debian/copyright remain human-readable, and thus not to overengineer this proposal by adding too many fields. However I believe that as it is, it remains clear enough to a human (as suggested by the examples at the end).

It should be encoded UTF-8.

For clarity we should recommend separating machine-interpretable parts with empty lines. This is not mandatory.

-- Also, it is important to allow any form of free text in the file, be it before or after the machine-interpretable part. I therefore suggest that fields can be interspread anywhere in the file. Lines that do not start with a known field name or that do not start with a space and follow a valid line should be ignored by an interpreter. -- SamHocevar

-- Its probably a good idea to keep a human-readable (traditional) part in the copyright file. Because while the proposed format is quiet easily readable to people, who are actually technically experienced, it might not be so easily readable for not so technically experienced people. And these people should not be barred from understanding the copyright situation for a given package. I know tht this is extra work, but its a good practice IMHO and probably can be dropped, once there exist interpreters for the copyright file. (PatrickSchoenfeld)

-- I think it is important to keep the files strictly RFC2822 format and I think that including the human-readable licence text certainly seems to be readble with only "." seperating the paragraphs. If we are going to make this machine parsable we should go all the way. If interpreters are needed, they can and will be built. - ?NoahSlater

-- I also agree that all should be machine parsable. See the Notice section. -- ?MathieuParent

Implementation

I have proposed some lintian checks. You can discuss implementation details in [http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=478930 bug 478930] -- ?MathieuParent

Sections

Header section (once)

Why have Format-Specification and Upstream-Author had the "only once" requirement removed? Multiple values can be placed on new lines within the same field. -- ?NoahSlater

Can we take this oportunity to use the word "packaged" instead of "debianized"? -- ?NoahSlater

The header should be rfc2822 compliant.

Example:

Format-Specification: http://wiki.debian.org/Proposals/CopyrightFormat
Upstream-Author: John Soft <some.email@example.com>
Debianized-By: Robert Package <some.email@example.org>
Debianized-Date: Sun, 10 Jun 2007 16:13:07 +0000.
Original-Source-Location: http://sourceforge.net/projects/software/
Original-Source-Command: ./debian/rules get-orig-source
Original-Source-Size: 389120
Original-Source-MD5: 934d927eecfdb5a1a4a17798de3ed60f
Original-Source-Depends: autoconf, automake, libtool, subversion-tools

Files section (repeatable)

The details are yet to be discussed. Here is a list of what is needed, and which fields I suggest to add:

License section (repeatable)

Where multiple sets of files use the same licence you can avoid repetition by using multiple Licence fields with keywords only and use a separate set of Licence fields to expand the licence description.

Files: src/js/editline/*
Copright: Copyright 1993, Simmule Turner
 Copyright 1993, Rich Salz
License: MPL-1.1 | GPL-2 | LGPL-2.1

Files: src/js/fdlibm/*
Copright: Copyright 1993, Sun Microsystems Corporation
License: MPL-1.1 | GPL-2 | LGPL-2.1

Licence: MPL-1.1
 [LICENCE TEXT]

Licence: GPL-2
 [LICENCE TEXT]

Licence: LGPL-2.1
  [LICENCE TEXT]

-- Hi Mathieu, and Noah, and thanks for your work on this page. I have a question about multiple license fields: often the GPL statements will use the actual name of the program used instead of "This package". Do you think that it is possible to handle this correctly with the proposed multiple license fields, and if yes, how ? -- CharlesPlessy

-- Note that the GPL licence does not permit modification so you find a modified GPL licence it is not correct to call it GPL. What I think you may be refering to is customised boilerplate text such as "Copyright 1999, Jim Bob. Available under the GPL" which is no problem as you are not expected to put this kind of information in the copyright file.

Fields detail

Files: field

Field format

The contents of the Files field should be a list of comma-separated values:

Files: foo.c, bar.*, baz.[ch]

Files containing spaces or commas should be put within double quotes. The backslash character is an escaping character, be it inside or outside double quotes:

Files: "Program Files/*", manual\[english\].txt

Pattern syntax

Patterns are the ones recognised by the find utility's -name and -wholename flags. They behave as if find had been called in the following way from the top source directory:

find . -wholename "$PATTERN"

This will match all Makefile.am files in the tree and all Python scripts:

Files: */Makefile.am, *.py

But this will only match the top-level Makefile.am:

Files: ./Makefile.am

Special rule: if a pattern $PATTERN does not match any file in the source, it is implicitly considered to be expanded to */$PATTERN. This is to avoid insane verbosity when referring to a unique file buried deep in the tree.

Match order

It is quite common for a work to have most of its files under a given license, and only a few files (for instance, embedded getopt.c and getopt.h) under another. However it makes more sense to have the copyright file list the "main" license first.

Matches should be exclusive (a file can only match one rule). The final rule that should be considered is the most specific one (the one that matches the fewer files), or if this is ambiguous, the last one in the file.

-- What is the most specific one ? Can you propose an algorithm ? ?MathieuParent

Thus, in this case of getopt.c, it is the second rule that has to be taken into account:

Files: *
Copyright: [the main work’s author]
License: [the main work’s license]
Files: getopt.*
Copyright: © 2000 the NetBSD Foundation, Inc.
License: other-BSD
 [text of the NetBSD license]

License: field

License keywords

The "License" field format should not contain random values. Which is why there needs to be a list of accepted keywords which have a very specific, unambiguous meaning. Here is a non-exhaustive list, please help fill it with popular license names we're likely to meet in Debian:

keyword

meaning

GPL-any

GNU General Public License, author did not specify version ?BR (probably the same as GPL-1+)

GPL-1

GNU General Public License, version 1 only

GPL-1+

GNU General Public License, version 1 or later ?BR (probably the same as GPL-any)

GPL-2

GNU General Public License, version 2 only

GPL-2+

GNU General Public License, version 2 or later

GPL-3

GNU General Public License, version 3 only

GPL-3+

GNU General Public License, version 3 or later

LGPL-any

GNU Library/Lesser General Public License, author did not specify version

LGPL-2

GNU Library General Public License, version 2 only

LGPL-2+

GNU Library General Public License, version 2 or later

LGPL-2.1

GNU Lesser General Public License, version 2.1 only

LGPL-2.1+

GNU Lesser General Public License, version 2.1 or later

LGPL-3

GNU Lesser General Public License, version 3 only

LGPL-3+

GNU Lesser General Public License, version 3 or later

PSF

Python License, author did not specify version

PSF-2

Python License, version 2 only

GFDL-any

GNU Free Documentation License, author did not specify version ?BR (maybe this needs mention of the fact that we accept no invariant sections, etc.)

GFDL-1.1

GNU Free Documentation License, version 1.1 only ?BR (same note as above)

GFDL-1.1+

GNU Free Documentation License, version 1.1 or newer ?BR (same note as above)

GFDL-1.2

GNU Free Documentation License, version 1.2 only ?BR (same note as above)

GFDL-1.2+

GNU Free Documentation License, version 1.2 or newer ?BR (same note as above)

GAP

GNU All-Permissive license, http://www.gnu.org/prep/maintain/maintain.html#License-Notices-for-Other-Files

BSD-2

Two-clause BSD license

BSD-3

Three-clause BSD license, with no-endorsement clause, as seen in /usr/share/common-licenses/BSD?BR

BSD-4

Four-clause BSD license, with no-endorsement clause and advertising clause; GPL-incompatible (need exact text)

Apache-1.0

Apache license, version 1.0; not GPL-compatible

Apache-1.1

Apache license, version 1.1; not GPL-compatible

Apache-2.0

Apache license, version 2.0; GPL-3-compatible, not GPL-2-compatible

MPL-1.1

Mozilla Public License, version 1.1 only, http://www.mozilla.org/MPL/MPL-1.1.html

Artistic

The original Artistic license, as seen in /usr/share/common-licenses/Artistic

Artistic-2.0

The Artistic license, version 2.0, http://www.perlfoundation.org/artistic_license_2_0

LPPL-1.3a

The LaTeX Project Public License, version 1.3a, http://www.latex-project.org/lppl/lppl-1-3a.txt; GPL-incompatible?BRNote that works under any version of the LPPL often have additional restrictions attached; check carefully.

ZPL

Zope Public License, author did not specify version

ZPL-2.1

Zope Public License, version 2.1 only

EPL-1.1

Erlang Public License, version 1.1 only

EFL-2

Eiffel Forum License, version 2 only

CPL

IBM Common Public License

CC-BY-3

Creative Commons Attribution License (Unported), version 3.0 only

CC-BY-SA-3

Creative Commons Attribution-?ShareAlike Licence (Unported), version 3.0 only

ZLIB

The zlib/libpng license as in http://www.opensource.org/licenses/zlib-license.php

...

add your favourite license here

other

Anything else not covered in this list, should be clarified in the following lines of the field

Stuff that we might want but that needs to be clarified:

other-BSD

a BSD-like license ?BR (not sure it's wise to have this keyword, especially since it might be GPL-incompatible; if in doubt, let's stick with "other")

MIT

Several variants of the MIT license exist: the standard version with three paragraphs (blanket permission, keep this notice, NO WARRANTY), a version with a no-endorsement clause, and other versions with slight wording differences.

MIT-any

When the work is licenced under an unspecified MIT style licence.

PD

In the public domain, not applicable everywhere

syntax

License names are case-insensitive.

The syntax of the field should follow debian/control's Depends field. The pipe character "|" is used for code that can be used under the terms of either licenses. The comma "," is used for code that must be used under the terms of both licenses (for rare cases where a single file contains code under both licenses).

For instance, this is a simple, "GPL version 2 or later" field:

License: GPL-2+

This is a dual-licensed GPL/Artistic work such as Perl:

License: GPL-1+ | Artistic

This is for a file that has both GPL and classic BSD code in it:

License: GPL-any, BSD-3

And this is for a file that has Perl code and classic BSD code in it:

License: GPL-1+ | Artistic, BSD-3

A GPL-2+ work with the OpenSSL exception is in effect a dual-licensed work that can be redistributed either under the GPL-2+, or under the GPL-2+ with the OpenSSL exception. It is thus expressed as "GPL-2+ | other":

License: GPL-2+ | other
 In addition, as a special exception, the author of this program gives
 permission to link the code of its release with the OpenSSL project's
 "OpenSSL" library (or with modified versions of it that use the same
 license as the "OpenSSL" library), and distribute the linked executables.
 You must obey the GNU General Public License in all respects for all of
 the code used other than "OpenSSL".  If you modify this file, you may
 extend this exception to your version of the file, but you are not
 obligated to do so.  If you do not wish to do so, delete this exception
 statement from your version."

Examples

Simple example

Here is a very simple example. This is the original copyright file for xsol:

This package was debianized by Josip Rodin <jrodin@jagor.srce.hr> on Sun,  8 Nov 1998 18:00:00 +0100
Original source may be found at: ftp://sunsite.unc.edu/pub/Linux/X11/games/
Upstream author: Brian Masney <masneyb@newwave.net>.
Licensed under the terms of GNU GPL v2 (or later).
On Debian systems, the complete text of the GNU General Public License can be found in file "/usr/share/common-licenses/GPL".

And this is a possible machine-interpretable format:

Format-Specification: http://wiki.debian.org/Proposals/CopyrightFormat
Debianized-By: Josip Rodin <jrodin@jagor.srce.hr>
Debianized-Date: Sun,  8 Nov 1998 18:00:00 +0100
Original-Source: ftp://sunsite.unc.edu/pub/Linux/X11/games/

Files: debian/*
Copyright: Copyright 1998, Josip Rodin <jrodin@jagor.srce.hr>
License: other
 [LICENCE TEXT]

Files: *
Copyright: Brian Masney <masneyb@newwave.net>
License: GPL-2+
 This package is free software; you can redistribute it and/or modify
 it under the terms of the GNU General Public License as published by
 the Free Software Foundation; either version 2 of the License, or
 (at your option) any later version.
 .
 On Debian systems, the complete text of the GNU General Public License
 can be found in file "/usr/share/common-licenses/GPL".

Complex example

Can we please get another example for this section? As funny as the WTFPL is, I think there are better choices for this section. -- ?NoahSlater

This is the original copyright file for monsterz:

This package was downloaded from http://sam.zoy.org/monsterz/
monsterz.c, monsterz.py: Copyright (c) 2004-2005 Sam Hocevar <sam@zoy.org>
 |             DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE
 |                     Version 2, December 2004
 |
 |  Copyright (C) 2004 Sam Hocevar
 |   22 rue de Plaisance, 75014 Paris, France
 |  Everyone is permitted to copy and distribute verbatim or modified
 |  copies of this license document, and changing it is allowed as long
 |  as the name is changed.
 |
 |             DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE
 |    TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
 |
 |   0. You just DO WHAT THE FUCK YOU WANT TO.
music.s3m: Copyright (c) 1998 MenTaLguY <http://moonbase.rydia.net/>
 |  music.s3m was put in the public domain by MenTaLguY.
applause.wav, pop.wav: Copyright (c) 2002, 2005 Sun Microsystems, Inc.
 |  applause.wav was taken from OpenOffice.org's applause.wav and pop.wav was
 |  taken from OpenOffice.org's laser.wav. This product is made available
 |  subject to the terms of GNU Lesser General Public License Version 2.1.
click.wav: Copyright (c) Michael Speck <kulkanie@gmx.net>
 |  click.wav was taken from Barrage's click.wav. This program is free
 |  software; you can redistribute it and/or modify it under the terms
 |  of the GNU General Public License as published by the Free Software
 |  Foundation; either version 2 of the License, or (at your option) any
 |  later version.
boing.wav, ding.wav, duh.wav, grunt.wav, laugh.wv, whip.wav:
  Copyright (C) 2003 by David White <davidnwhite@optusnet.com.au> and the
  Battle for Wesnoth project
  Copyright (C) 2006 Sam Hocevar <sam@zoy.org>
 |  boing.wav was taken from Wesnoth's spear.wav and reworked by Sam
 |  Hocevar, ding.wav was taken from receive.wav, duh.wav was taken from
 |  female-strong-hit.wav, grunt.wav was taken from dwarf-die.wav, laugh.wav
 |  was taken from zombie-hit.wav, whip.wav was taken from dagger-swish.wav.
 |  This program is free software; you can redistribute it and/or modify
 |  it under the terms of the GNU General Public License. This program is
 |  distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY.
warning.wav: Copyright (c) Mike Kershaw <dragorn@kismetwireless.net>
 |  warning.wav was taken from Kismet's alert.wav. It is distributed under
 |  the terms of the GNU General Public License.
On Debian GNU/Linux systems, the complete text of the GNU General
Public License can be found in `/usr/share/common-licenses/GPL' and the
complete text of the GNU Lesser General Public License can be found in
`/usr/share/common-licenses/LGPL'.

Proposed format:

Format-Specification: http://wiki.debian.org/Proposals/CopyrightFormat
Original-Source: http://sam.zoy.org/monsterz/
Files: debian/*
Copyright: © 2004-2007 Sam Hocevar <sam@zoy.org>
License: GPL-2+
 The Debian packaging information is under the GPL, version 2 or later
Files: *.c, *.py
Copyright: © 2004-2005 Sam Hocevar <sam@zoy.org>
License: other-BSD
              DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE
                      Version 2, December 2004
 .
   Copyright (C) 2004 Sam Hocevar
    22 rue de Plaisance, 75014 Paris, France
   Everyone is permitted to copy and distribute verbatim or modified
   copies of this license document, and changing it is allowed as long
   as the name is changed.
 .
              DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE
     TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
 .
    0. You just DO WHAT THE FUCK YOU WANT TO.
Files: music.s3m
Copyright: © 1998 MenTaLguY <http://moonbase.rydia.net/>
License: PD
 music.s3m was put in the public domain by MenTaLguY.
Files: applause.wav, pop.wav
Copyright: © 2002, 2005 Sun Microsystems, Inc.
License: LGPL-2.1
 applause.wav was taken from OpenOffice.org's applause.wav and pop.wav was
 taken from OpenOffice.org's laser.wav. This product is made available
 subject to the terms of GNU Lesser General Public License Version 2.1.
Files: click.wav
Copyright: © Michael Speck <kulkanie@gmx.net>
License: GPL-2+
 click.wav was taken from Barrage's click.wav. This program is free
 software; you can redistribute it and/or modify it under the terms
 of the GNU General Public License as published by the Free Software
 Foundation; either version 2 of the License, or (at your option) any
 later version.
Files: boing.wav, ding.wav, duh.wav, grunt.wav, laugh.wav, whip.wav
Copyright: © 2003 by David White <davidnwhite@optusnet.com.au> and the
                  Battle for Wesnoth project
           © 2006 Sam Hocevar <sam@zoy.org>
License: GPL-any
 boing.wav was taken from Wesnoth's spear.wav and reworked by Sam
 Hocevar, ding.wav was taken from receive.wav, duh.wav was taken from
 female-strong-hit.wav, grunt.wav was taken from dwarf-die.wav, laugh.wav
 was taken from zombie-hit.wav, whip.wav was taken from dagger-swish.wav.
 This program is free software; you can redistribute it and/or modify
 it under the terms of the GNU General Public License. This program is
 distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY.
Files: warning.wav
Copyright: © Mike Kershaw <dragorn@kismetwireless.net>
License: GPL-any
 warning.wav was taken from Kismet's alert.wav. It is distributed under
 the terms of the GNU General Public License.
On Debian GNU/Linux systems, the complete text of the GNU General
Public License can be found in `/usr/share/common-licenses/GPL' and the
complete text of the GNU Lesser General Public License can be found in
`/usr/share/common-licenses/LGPL'.

This is how it could look like in vim:

attachment:debian-copyright-vim-syntax.png

Recent changes