Spam in the Debian List Archive

Note that all this is very preliminary. Comments and suggestions are very welcome.

Status quo

It has been claimed that the [http://lists.debian.org/ Debian list archives] contain spam email messages.

There is a "report as spam" button in on the list archive page of each message, but presently, spam is by and large not removed from the archives. The submissions seem to help (more or less) with finding spam but need manual review before they could be acted upon.

Towards a spam removal policy

Policy corner stones

Ad hoc policy

Review standards should be set after seeing how things pan out, I am aiming at three reviewers, including one experienced one (after some bootstrapping). I hope this would minimize the risk of unwarrented removal. A rigorous standard seems to be necessary to obtain consensus with the project. As such, the three reviewers is only a guideline, not a rule. Of course, more reviewers doing shorter reviews would help tremendously. Ultimately, guaranteeing the integrety of the list archives currently falls in the realm of the Debian listmaster.

Practical matters

About using newspamclassify.py:

Any suggestions on the above and/or the program are of course welcome.

Suggested Improvements

People doing this

If you want to jump in, add yourself here and contact [mailto:tv@beamnet.de Thomas] (tomv_w on IRC) for coordination. Your help is appreciated.

Works in progress

Our goal is to have at least three reports before removing anything. For the following lists, we have some, but not enough review reports. The people mentioned already sent in reports. Your help can most immediately used if you review lists which already have some, but not enough names listed. Please add your name after you sent in your report.

People

Success stories

Getting program and data


CategoryTeams