I don’t know that the blacklist uses regexes, and I didn’t think it currently applied to trackbacks.
regex use also is more cpu intensive, so I try to avoid it when fast matching will do the job. CG-AntiSpam and CG-Referrer use a simple keyword matching on the domain (and links).
Unfortunately, the reality is that spammers can crank out new sites and URLs as quickly as we defeat them. CG-Referrer stops what it can at the gate before hitting WP proper, CG-AntiSpam uses additional methods to detect possible spam posting.
SpamKarma and ReferrerKarma are also great solutions to try and stop things early on in the process.
I think the next thing, beyond just wordpress users, is going to be some kind of managed spam domain list and spamwords URL listing as well — I’m looking at setting up my list in such a way that tools can check for and retrieve updates, and so users can submit what they think are spam (referrers or comments) to get into the next update round. Maybe have a bunch of folks oversee it.
Beyond that, we need to find ways to shutdown these guys. 90% of my referrer spam is a domain name whose contacts are from one of two other domains, both of which are registered with a french registrar. The ‘front end’ domains are usually through cheap systems, disconnected otherwise. It’s like a shell company. We just have to figure out proper channels to get them shut down…
-d
Well, the blacklist check uses the preg_match() function so it seems a regular expression would work. Did you try cutting it down to just (pharmac|viagra|phentermine|pill|valtrex|zyrtec|prescription)?
ColdForged,
I had assumed it was using ereg, so I did my patterns according to that. I also wanted to tailor it so that the regex would only work on strings with http:// and the keyword in it, thus restricting it to URI instances. That seems to be the most reliable way to avoid false-positives that I’ve found.
I’ll try to re-work it using preg_match syntax instead.
davidchait,
I’m constantly talking to my hosting to be sure that the regex’s I’m using aren’t causing any load issues and I’m told they don’t notice anything. I also want to avoid using 3rd party plugins if the buil-in functionality of WordPress will work just fine.
Thanks for the input so far guys.
Understood — the 3rd party plugins have some real advanced stuff in them that the blacklist code doesn’t, and filter pingback/trackback spam as well (which I assume WP might internalize at some point).
CG-AntiSpam specifically culls out URLs for its spamword checks, to avoid false-positives — so you aren’t alone in that logic. 😉
-d