scancode-toolkit has a large number of dependencies that are currently missing from Debian. I'm ultimately only really after code that can detect copyrights/licenses so it can be imported by decopy, rather than the entire framework (because of the large number of dependencies).

Ideally, what I'd like to use is a Python library that doesn't require any initialization or configuration and that I can pass the contents of a file and for which I can get back the license and copyright information:

   def detect_copyright_and_licenses(text: bytes) -> Union[List[Info], 
FullLicense]:
   """Reads the contents of a file and returns license info.

   For pure license files, just returns FullLicense object,
   otherwise returns a list of Info objects.
   """

   class FullLicense:
       license_id: LicenseId
       text: str

   class Info:
       # For every field, offset information in the file

       # List of copyright holders
       copyright_holders: List[Tuple[Year, Name]]

       # License identifier (e.g. "GPL-2.0 or Apache-2.0-or-later")
       license_id: LicenseExpression

       # the license header and warranty statement as found in the file
       # verbatim
       reference: str

Status

In progress/done

Ideally avoided, doesn't seem essential for license detection

Unsure

dependencies of other things not packaged in Debian