scancode-toolkit has a large number of dependencies that are currently missing from Debian. I'm ultimately only really after code that can detect copyrights/licenses so it can be imported by decopy, rather than the entire framework (because of the large number of dependencies).

Ideally, what I'd like to use is a Python library that doesn't require any initialization or configuration and that I can pass the contents of a file and for which I can get back the license and copyright information:

   def detect_copyright_and_licenses(text: bytes) -> Union[List[Info], 
   """Reads the contents of a file and returns license info.

   For pure license files, just returns FullLicense object,
   otherwise returns a list of Info objects.

   class FullLicense:
       license_id: LicenseId
       text: str

   class Info:
       # For every field, offset information in the file

       # List of copyright holders
       copyright_holders: List[Tuple[Year, Name]]

       # License identifier (e.g. "GPL-2.0 or Apache-2.0-or-later")
       license_id: LicenseExpression

       # the license header and warranty statement as found in the file
       # verbatim
       reference: str


In progress/done

Ideally avoided, doesn't seem essential for license detection


dependencies of other things not packaged in Debian