If you’ll take a look at the training folder, you’ll notice there is now a third subfolder: Analyze. Much of the spam we’ve been receiving lately has been of the Bayesian poisoning variety, which means that sending it to the trainer will have the end result of making the trainer less effective. I am starting to look at the deterministic-y rules to see if there’s anything I can pump up in order to make spam detection more effective.
The analyze folder is part of the result. Messages sent to that folder will be parsed for relevant spam information, and then added to a database where I’m collecting statistics about rulesets. I will make the statistics I collect available to everyone (via mission control) so you can see how I’m making the decisions about the rulesets.
Please keep in mind that unlike the other two folders, messages sent to the analyzer will be saved. If you have a problem with this, do not use it. This is to enable me to collect other data from the spam corpus as necessary.
If you so choose, you can send copies of mail to the Spam and Analyze folder. Please do not put messages marked as Ham into the Analyze folder, as that will skew the results. Thanks!
Update: The spam analyzer interface is now available via the mission control menu. If you have time, take a look at it, and either mail me some comments, or even better, comment about it on this post. I’ll follow up with proposed changes in the comments.