For a better, near-time Spam-removal we should rely on a database.
Current "database" is a flat file using this format:
post:count:msgid:ham:spam:inappropriate
post --> debian-www/2005/08/msg00051.html (a path in the archive. unique)
count --> 936 number of nominations through "report as Spam"-Button on the Webarchive.
msgid --> the message-id of the message. (not unique, as it can appear in more than one mailinglist (and maybe on many messages)
ham --> comma-separated list of ham-voters from Rewiewsystem.
spam --> comma-separated list of spam-voters from Rewiewsystem.
inappropriate --> comma-separated list of inappropriate-voters from Rewiewsystem.
We run once a day over this file and generate the exclude-files.