Ive fed DSPAM thousands of spam, and am only getting marginal accuracy. Whats up?
A. Your problem might be that you’ve fed DSPAM thousands of spam, but have not fed it enough nonspam for it to learn adequately. It’s typically a bad practice to feed a statistical filter a grossly unbalanced corpus of mail, and if you’re using a version of DSPAM that has a “training buffer” enabled by default, feeding a ton of spam can also cause it to start watering down its results until you feed it more ham. This watering down gets stronger the higher your spam ratio is, in an attempt to prevent false positives – so the more spam you feed it, the worse your accuracy will get.