I have noticed recently that a lot of vague, meaningless but non dodgy looking spam is getting through your filters. These are sentences easily spinned to make them unique like:
“You site is brilliant, but have you noticed that it is a bit slow in Chrome version 22.3. It takes ages to load?”
“I stumbled across your site and now I am a definite fan. I will bookmark it and give it to all my friends. Thank you for your brilliant work”
I am just wondering if you could have an option where admin could configure it so that comments that don’t mention at least 1-5 (or whatever) words from the article (exlcuding stop/noise words like at, the, it, am etc) would be marked as spam or put in the queue for moderation.
It just seems that none of these vague spam comments EVER mention the key topic words (computers, a fix, a news story character, something APART from the name of the site/url) so it could be a good way to block a lot of this new spam out quickly.
Maybe use it in your Bayesian scoring system e.g if it doesn’t mention at least 2 key words (SQL and Java in a computing article) then +10, if it has 2 links in it (+20), if it mentions (viagra, porn, downloads) + 10 and so on.
Just an idea!
- The topic ‘Too much vague spam getting through’ is closed to new replies.