WP not catching comment spam HTML entities

charlesarthur
(@charlesarthur)

19 years, 4 months ago

I’m being annoyed by a comment spammer who is using HTML entities (those things starting with “#&” in the name, email, web address and sometimes also body of the spam.
WP 1.2 (not 1.2.1), also with Kitten’s Spam Words and the Three Strikes plugins operative.
But I’ve read elsewhere that WP should catch HTML entity attempts, and they should be blockable. (By using those, the “bad word” doesn’t get checked by the Spam Words plugin because it looks at the raw text, ful of &; &;; but a browser and presumably Google sees the “bad word”).
Am I missing some obvious setting? Or should I upgrade? Or what?

Viewing 5 replies - 1 through 5 (of 5 total)

Moderator James Huff
(@macmanx)

Volunteer Moderator

19 years, 4 months ago

You don’t have to upgrade, but I would give the other available anti-spam solutions a try: http://www.tamba2.org.uk/wordpress/spam/

akc
(@akc)

19 years, 3 months ago

I’ve been plagued by this problem too (I’m on 1.2.1). Following Lucky1/bjoern, I’ve made slight modifications in the functions.php script (under wp-includes) to

1) stop processing comments where the author field has the characters &#. (On an English blog those characters are not necessary at all and are a reliable indicator of spam.)

2) stop processing comments where the author field matches a spam word: I got tired of disapproving spam.

By “stop processing” I mean the script stops running right after spam has been submitted but *before* it enters the database. Comments not fitting the above cases will be processed as usual.

I added the following as the very first line of function check_comment:

if(ereg(“&#”, $author)) die( “Oh…it’s a spammer…” );

(Note that even if you have comment moderation and/or link counts turned on, comment like this will still not be submitted for consideration).

I also changed

if ( preg_match($pattern, $author) ) return false;

to

if ( preg_match($pattern, $author) ) die( “Oh…a spammer…” );

Again, it tells the script to stop further processing if the author field matches a spam word. This is strong stuff, so you may want to leave this out.

See also: http://wordpress.org/support/topic.php?id=20843

akc
(@akc)

19 years, 3 months ago

Oh, btw, since WP is by default Unicode (UTF-8) encoded, there really isn’t much of a reason to use HTML numeric entities to refer to Unicode characters (e.g. &# 36453). (Little reasons include: some old, buggy browsers, maybe? Using numeric entities to call up hard-to-type Unicode characters.)

And I guess we could be a bit more precise by dissing only those author fields having entities for *low* ASCII (i.e. the usual alphabet, numbers, punctuation marks). This will let through Chinese spammers spamming in Chinese, for example. But that hasn’t been a problem on my blog (only my email!)

autumnqiu
(@autumnqiu)

19 years, 2 months ago

To WordPress Developers,

Just wondering if the above code has been implemented on WordPress 1.5 Strayhorn? (As in, commenters are not allowed to encode their comments — a form of spam protection)

Thanks!

pericat
(@pericat)

19 years, 2 months ago

They’ve put in code to catch and evaluate HTML entities. I saw it earlier today.