This page is about a proposal to make debian/copyright machine-interpretable. It is one of the most important files in Debian packaging, yet its format is vague and varies tremendously across packages, making it difficult to automatically parse.

This is not a proposal to change the policy in the short term.

Recent changes:

Rationale

The diversity of free software licences means that Debian does not only need to care about the freeness of a given work, but also its licence's compatibility with the other parts of Debian it uses.

The arrival of the GPL version 3, its incompatibility with version 2, and our inability to spot the software where the incompatibility might be problematic is the most recent occurrence of this limitation.

There are a few precedents, also. One is the GPL/OpenSSL incompatibility. Apart from grepping debian/copyright, which is prone to numerous false positives (packaging under the GPL but software under another license) or negatives (GPL software but with an "OpenSSL special exception" dual licensing form), there is no reliable way to know which software in Debian might be problematic.

And there is more to come. The GPL version 3 is compatible with the CDDL, but the GPL version 2 isn’t. Which means that in the near future, GPLv2-only software cannot be distributed as part of a CDDL operating system such as Nexenta. We have no way to know how much of Debian should be stripped from such a system.

Proposal

I suggest to add simple RFC2822 multiline fields to debian/copyright containing machine-interpretable values for copyright holders, known licenses, upstream URLs etc.

These fields should be clear enough to obviate duplicating their information somewhere else in the file.

Compatiblity and human-readability

It is important to have debian/copyright remain human-readable, and thus not to overengineer this proposal by adding too many fields. However I believe that as it is, it remains clear enough to a human (as suggested by the examples at the end).

Also, it is important to allow any form of free text in the file, be it before or after the machine-interpretable part. I therefore suggest that fields can be interspread anywhere in the file. Lines that do not start with a known field name or that do not start with a space and follow a valid line should be ignored by an interpreter.

For clarity we should recommend separating machine-interpretable parts with empty lines.

Fields

The details are yet to be discussed. Here is a list of what is needed, and which fields I suggest to add:

File patterns

Field format

The contents of the Files field should be a list of comma-separated values:

Files: foo.c, bar.*, baz.[ch]

Files containing spaces or commas should be put within double quotes. The backslash character is an escaping character, be it inside or outside double quotes:

Files: "Program Files/*", manual\[english\].txt

Pattern syntax

File paths are usually expressed from the top source directory. This will only match the top-level Makefile.am:

Files: Makefile.am

Patterns are the ones recognised by the find utility's -name and -wholename flags. They behave as if find had be called in the following way from the top source directory:

find * -wholename "$PATTERN"

This will match all python files in any subdirectory:

Files: */*.py

Special rule: if a pattern $PATTERN does not match any file in the source, it is implicitly considered to be expanded to */$PATTERN. This is to avoid insane verbosity when referring to a unique file buried deep in the tree.

Match order

It is quite common for a work to have most of its files under a given license, and only a few files (for instance, embedded getopt.c and getopt.h) under another. However it makes more sense to have the copyright file list the "main" license first.

Matches should be exclusive (a file can only match one rule). The final rule that should be considered is the most specific one (the one that matches the fewer files), or if this is ambiguous, the last one in the file.

Thus, in this case of getopt.c, it is the second rule that has to be taken into account:

Files: *
Copyright: [the main work’s author]
License: [the main work’s license]

Files: getopt.*
Copyright: © 2000 the NetBSD Foundation, Inc.
License: other-bsd
 [text of the NetBSD license]

License keywords

The "License" field format should not contain random values. Which is why there needs to be a list of accepted keywords which have a very specific, unambiguous meaning. Here is a non-exhaustive list, please help fill it with popular license names we're likely to meet in Debian:

keyword

meaning

GPL

GNU General Public License, author did not specify version ?BR (probably the same as GPLv1+)

GPLv1

GNU General Public License, version 1 only

GPLv1+

GNU General Public License, version 1 or later ?BR (probably the same as GPL)

GPLv2

GNU General Public License, version 2 only

GPLv2+

GNU General Public License, version 2 or later

GPLv3

GNU General Public License, version 3 only

GPLv3+

GNU General Public License, version 3 or later

LGPL

GNU Lesser General Public License, author did not specify version

LGPLv2.1

GNU Lesser General Public License, version 2.1 only

GFDL

GNU Free Documentation License, author did not specify version ?BR (maybe this needs mention of the fact that we accept no invariant sections, etc.)

GFDLv1.2

GNU Free Documentation License, version 1.2 only ?BR (same note as above)

BSD

classic three-term BSD license, as seen in /usr/share/common-licenses/BSD

Artistic

Artistic license

...

add your favourite license here

other

Anything else not covered in this list, should be clarified in the following lines of the field

Stuff that we might want but that needs to be clarified:

other-BSD

a BSD-like license ?BR (not sure it's wise to have this keyword, especially since it might be GPL-incompatible; if in doubt, let's stick with "other")

PD

public domain, not applicable everywhere

License names are case-insensitive.

The syntax of the field should follow debian/control's Depends field. The pipe character "|" is used for code that can be used under the terms of either licenses. The comma "," is used for code that must be used under the terms of both licenses (for rare cases where a single file contains code under both licenses). The parentheses "()" are used to link to a file in /usr/share/common-licences that contains the exact text of the license.

For instance, this is a simple, "GPL version 2 or later" field:

License: GPLv2+ (/usr/share/common-licences/GPL-2)

This is a dual-licensed GPL/Artistic work such as Perl:

License: GPLv1+ | Artistic

This is for a file that has both GPL and BSD code in it:

License: GPL (/usr/share/common-licences/GPL), BSD

And this is for a file that has Perl code and BSD code in it:

License: GPLv1+ | Artistic, BSD

A GPLv2+ work with the OpenSSL exception is in effect a dual-licensed work that can be redistributed either under the GPLv2+, or under the GPLv2+ with the OpenSSL exception. It is thus expressed as "GPLv2+ | other":

License: GPLv2+ (/usr/share/common-licences/GPL-2) | other
 In addition, as a special exception, the author of this program gives
 permission to link the code of its release with the OpenSSL project's
 "OpenSSL" library (or with modified versions of it that use the same
 license as the "OpenSSL" library), and distribute the linked executables.
 You must obey the GNU General Public License in all respects for all of
 the code used other than "OpenSSL".  If you modify this file, you may
 extend this exception to your version of the file, but you are not
 obligated to do so.  If you do not wish to do so, delete this exception
 statement from your version."

Examples

Simple example

Here is a very simple example. This is the original copyright file for xsol:

This package was debianized by Josip Rodin <jrodin@jagor.srce.hr> on
Sun,  8 Nov 1998 18:00:00 +0100

Original source may be found at: ftp://sunsite.unc.edu/pub/Linux/X11/games/

Upstream author: Brian Masney <masneyb@newwave.net>.

Licensed under the terms of GNU GPL v2 (or later).

On Debian systems, the complete text of the GNU General Public License
can be found in file "/usr/share/common-licenses/GPL".

And this is a possible machine-interpretable format:

Original source may be found at: ftp://sunsite.unc.edu/pub/Linux/X11/games/

Files: debian/*
Copyright: [previous packager whose copyright might still apply]
           © 1998 Josip Rodin <jrodin@jagor.srce.hr>
License: [license of the packaging itself, if meaningful]

Files: *
Copyright: Brian Masney <masneyb@newwave.net>
License: GPLv2+ (/usr/share/common-licences/GPL-2)

Complex example

This is the original copyright file for monsterz:

This package was downloaded from http://sam.zoy.org/monsterz/

monsterz.c, monsterz.py: Copyright (c) 2004-2005 Sam Hocevar <sam@zoy.org>

 |             DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE
 |                     Version 2, December 2004
 |
 |  Copyright (C) 2004 Sam Hocevar
 |   22 rue de Plaisance, 75014 Paris, France
 |  Everyone is permitted to copy and distribute verbatim or modified
 |  copies of this license document, and changing it is allowed as long
 |  as the name is changed.
 |
 |             DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE
 |    TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
 |
 |   0. You just DO WHAT THE FUCK YOU WANT TO.

music.s3m: Copyright (c) 1998 MenTaLguY <http://moonbase.rydia.net/>

 |  music.s3m was put in the public domain by MenTaLguY.

applause.wav, pop.wav: Copyright (c) 2002, 2005 Sun Microsystems, Inc.

 |  applause.wav was taken from OpenOffice.org's applause.wav and pop.wav was
 |  taken from OpenOffice.org's laser.wav. This product is made available
 |  subject to the terms of GNU Lesser General Public License Version 2.1.

click.wav: Copyright (c) Michael Speck <kulkanie@gmx.net>

 |  click.wav was taken from Barrage's click.wav. This program is free
 |  software; you can redistribute it and/or modify it under the terms
 |  of the GNU General Public License as published by the Free Software
 |  Foundation; either version 2 of the License, or (at your option) any
 |  later version.

boing.wav, ding.wav, duh.wav, grunt.wav, laugh.wv, whip.wav:
  Copyright (C) 2003 by David White <davidnwhite@optusnet.com.au> and the
  Battle for Wesnoth project
  Copyright (C) 2006 Sam Hocevar <sam@zoy.org>

 |  boing.wav was taken from Wesnoth's spear.wav and reworked by Sam
 |  Hocevar, ding.wav was taken from receive.wav, duh.wav was taken from
 |  female-strong-hit.wav, grunt.wav was taken from dwarf-die.wav, laugh.wav
 |  was taken from zombie-hit.wav, whip.wav was taken from dagger-swish.wav.
 |  This program is free software; you can redistribute it and/or modify
 |  it under the terms of the GNU General Public License. This program is
 |  distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY.

warning.wav: Copyright (c) Mike Kershaw <dragorn@kismetwireless.net>

 |  warning.wav was taken from Kismet's alert.wav. It is distributed under
 |  the terms of the GNU General Public License.

On Debian GNU/Linux systems, the complete text of the GNU General
Public License can be found in `/usr/share/common-licenses/GPL' and the
complete text of the GNU Lesser General Public License can be found in
`/usr/share/common-licenses/LGPL'.

Proposed format:

This package was downloaded from http://sam.zoy.org/monsterz/

Files: debian/*
Copyright: © 2004-2007 Sam Hocevar <sam@zoy.org>
License: GPLv2+ (/usr/share/common-licences/GPL-2)
 The Debian packaging information is under the GPL, version 2 or later

Files: *.c, *.py
Copyright: © 2004-2005 Sam Hocevar <sam@zoy.org>
License: BSD-like
              DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE
                      Version 2, December 2004
 .
   Copyright (C) 2004 Sam Hocevar
    22 rue de Plaisance, 75014 Paris, France
   Everyone is permitted to copy and distribute verbatim or modified
   copies of this license document, and changing it is allowed as long
   as the name is changed.
 .
              DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE
     TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
 .
    0. You just DO WHAT THE FUCK YOU WANT TO.

Files: music.s3m
Copyright: © 1998 MenTaLguY <http://moonbase.rydia.net/>
License: PD
 music.s3m was put in the public domain by MenTaLguY.

Files: applause.wav, pop.wav
Copyright: © 2002, 2005 Sun Microsystems, Inc.
License: LGPLv2.1 (/usr/share/common-licences/LGPL-2.1)
 applause.wav was taken from OpenOffice.org's applause.wav and pop.wav was
 taken from OpenOffice.org's laser.wav. This product is made available
 subject to the terms of GNU Lesser General Public License Version 2.1.

Files: click.wav
Copyright: © Michael Speck <kulkanie@gmx.net>
License: GPLv2+ (/usr/share/common-licences/GPL-2)
 click.wav was taken from Barrage's click.wav. This program is free
 software; you can redistribute it and/or modify it under the terms
 of the GNU General Public License as published by the Free Software
 Foundation; either version 2 of the License, or (at your option) any
 later version.

Files: boing.wav, ding.wav, duh.wav, grunt.wav, laugh.wav, whip.wav
Copyright: © 2003 by David White <davidnwhite@optusnet.com.au> and the
                  Battle for Wesnoth project
           © 2006 Sam Hocevar <sam@zoy.org>
License: GPL (/usr/share/common-licences/GPL)
 boing.wav was taken from Wesnoth's spear.wav and reworked by Sam
 Hocevar, ding.wav was taken from receive.wav, duh.wav was taken from
 female-strong-hit.wav, grunt.wav was taken from dwarf-die.wav, laugh.wav
 was taken from zombie-hit.wav, whip.wav was taken from dagger-swish.wav.
 This program is free software; you can redistribute it and/or modify
 it under the terms of the GNU General Public License. This program is
 distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY.

Files: warning.wav
Copyright: © Mike Kershaw <dragorn@kismetwireless.net>
License: GPL (/usr/share/common-licences/GPL)
 warning.wav was taken from Kismet's alert.wav. It is distributed under
 the terms of the GNU General Public License.

This is how it could look like in vim:

attachment:debian-copyright-vim-syntax.png