Comment spam prevention

scottallen
(@scottallen)

20 years, 3 months ago

Well, it’s happening — comment spam (already!)
At a bare minimum, we need to have an option to disallow hyperlinks in comments and have it strip out any A HREF and /a tags
Beyond that…
– word filtering
– blacklisting
– other suggestions?

Viewing 15 replies - 1 through 15 (of 19 total)

1 2 →

otaku42
(@otaku42)

20 years, 3 months ago

@other suggestions: comment moderation is already available, and automated comment moderation will be available soon. Things that are planned are blacklisting and bayesian filters.
Disallowing hyperlinks is (in my eyes) a bad idea. This will not just break the spam, but also other usefull links. Respectively, it won’t just break the spammers google ranking, but also other (wanted) site’s rank. And as Phil Ringnalda wrote in a comment to this blog post:

Something to consider, while looking for a way out of the spam nightmare we are in right now: anything (including not linking legit commenter’s URLs) which makes you have less impact on Google is a win for them.

By the way: you might be interested in joining the BlaM project (http://blam.sf.net), even if it is very silent currently. I’m aiming to get some things done as soon as I’m finished with the stuff I’m working on at my business job (hopefully by the end of this month).
Bye, Mike

Anonymous

20 years, 3 months ago

where is the comment moderation? what build first contained it?

Cena (a11n)
(@cena)

20 years, 3 months ago

They’re in 1.o, under options->general blog settings.

antifuse
(@antifuse)

20 years, 3 months ago

Dammit, I should have shut my big mouth… About 10 minutes after that comment I made, my first WP comment spam! :/

Sushubh
(@sushubh)

20 years, 3 months ago

always allow from this person
now how would the backend make sure the comments made will be made by the same individual? :S

jeremiah
(@jeremiah)

20 years, 3 months ago

Suhubh: Good question.. lol Maybe a ‘smart’ cookie? at the same time that you remember the information for the commenter (name, e-mail, etc.) you could track the major parts of the ip address?
If the name e-mail and ip range match, then allow the post to come through unmoderated.
Or, if you don’t display the e-mail address on the site, then you could just do a check to see if the e-mail and the name match your records. The spambots shouldn’t be able to circumvent that.

otaku42
(@otaku42)

20 years, 3 months ago

@”allow from this person”: currently there is no safe way to distinct people (or prevent others from faking their identity). This will last until there is a “register before posting a comment” function available, which I personally would dislike a lot. But the base idea (disabling links until comment gets approved) would be a nice feature. I put that on the “To Do list” 🙂

thatadamguy
(@thatadamguy)

20 years, 3 months ago

How about this:
– Comments from registered users are posted immediately
– Comments from unregistered users are queued for review
Would this be possible / desireable?

Lester Chan
(@gamerz)

20 years, 3 months ago

erm just do not allow people to post in post that are older than 30 days will combat most of the spams. At least that works for me

antifuse
(@antifuse)

20 years, 3 months ago

I like getting comments on old posts… often it gives me material for new posts 🙂 I like ThatAdamGuy’s idea.

thatadamguy
(@thatadamguy)

20 years, 3 months ago

> erm just do not allow people to post in post that are older than
> 30 days will combat most of the spams. At least that works for me
Like Antifuse, I like getting comments on old entries. In my case, I actually already get MORE (legitimate) comments in my old entries because some of them have become quite popular via Google searches. The last thing I want is for people to discover my blog via a keyword search and discover that the entry that mosts interests them is closed to commenting.
However, I do realize that comment spammers tend to gravitate towards the old entries. And while this might be over-complicated, maybe a points system is in order, sort of like how some e-mail anti-spam systems work (e.g., SpamAssasin)?
A comment would get ‘points’ if:
– it’s posted to an entry older than [x] days
– it is spam based on a Bayesian measurement
– it is by someone with an unrecognized e-mail address (someone who hasn’t posted before)
– it was posted less than [x] seconds after a previous post by the same IP address or with the same URL
… and all comments with more than a certain number of points would get rejected as spam. Actually, what’d be really amusing is if they were “posted” but only visible to the poster himself/herself (based upon cookies/IP addresses, etc.) so the spammer’d THINK the note got posted, but no one else would see it.
Sure, this would take some configuring, but the neat part is that — in its multi-prongness — it’d be ridiculously difficult for spammers to defeat! They wouldn’t know which specific features were triggering the refusal of their spam, and so it’d be quite hard for them to adapt.
What do you think?

Moderator Matt Mullenweg
(@matt)

20 years, 3 months ago

TAG, that’s exactly the kind of thing I had in mind. Maybe in 1.1, we’ll see.

Moderator Matt Mullenweg
(@matt)

20 years, 3 months ago

And expanding on the date idea, it would make sense to give points to a comment left on an entry that is no longer on the front page. Other possible tests for points:
* More than two dashes in domain name
* Never left approved comment before (match URI or email or both)
* Keywords (viagra, phene*, casino, explicit words)
* What else?

NuclearMoose
(@nuclearmoose)

20 years, 3 months ago

Whoa! Matt, Adam…I was thinking the EXACT SAME THING TOO! What are the odds of that? 🙂
Craig.

thatadamguy
(@thatadamguy)

20 years, 3 months ago

It’s great that so many of us are on the same page, apparently!
Matt… one thing I would, however, recommend against is too much separate keyword parsing. As we know from e-mail, spammers have a remarkable tendency to alter their wording (v1agra, expand your ‘member’, f*ree, etc.), and that merely becomes an arms race. Additionally, there’s simply too much overlap in language (people joke about viagra, folks may use harsh language in non-spam comments, and so on).
Instead, I’d focus more on the behavioral / process aspects you’ve highlighted (never left approved comment before, comment on entry no longer on front page, etc.).
In terms of filtering priorities, I’d likely to humbly suggest that the e-mail / URI test is the biggest one in my opinion. Especially if we assume e-mail addresses are not publicly listed (my preference), this then becomes a user-friendly password system of sorts that would be onerous for the spammer to thwart.