Spam: Kill it at the root

Posted November 17, 2003 by Dougal Campbell. Filed under Meta.

Matt and I were chatting briefly earlier today about blog comment spam, and about an article that Mark Pilgrim posted on his site. Mark is pretty pessimistic about the outlook for anti-comment-spam efforts. And he points out the great lengths to which spammers will go in order to get their spams out to where eyeballs will see them. Both usenet and email are awash with the refuse of spammers, because for some reason, there are people out there who respond to their messages.

However, I, like Sam Ruby, believe that the outlook isn’t quite as grim for blogs as it has been for usenet and email. Blogs are fundamentally different in how they work and in how people use them.

It was easy for spammers to abuse usenet, because it is a store-and-forward system designed specifically for distributing messages to a large number of leaf nodes. You post a message into a newsgroup, and your message is replicated around the world, on thousands of other usenet servers, viewed by millions of users. Clients and servers speak a simple protocol, NNTP, and though most servers these days require authentication before you can post a message, there are still open servers to be found, if you know how to look.

Likewise, email is a store-and-forward system, with a method of relaying messages from one server to another. Unlike usenet, email is designed to deliver messages directly to a particluar person, which makes it slightly less efficient for reaching large numbers of users. But only slightly, because it’s quite easy to specify multiple recipients for a message. So the spammer can deliver a single message to a server, and instruct the server to deliver it to hundreds of different people. And again, many modern servers attempt to prevent unauthorized users from relaying messages, but there are still plenty of vulnerable servers on the internet that the spammers can find.

The relative ease and extreme low cost of reaching large numbers of consumers have made usenet and email into juicy targets for spammers. Low-hanging fruit, if you will. Spammers have also targeted other services, such as instant messaging services, pagers, cell phones, and more recently, blogs.

Obviously, I have a great interest in the problem of spam on blogs. I run multiple personal blogs, and I’m a developer for the WordPress blogging software. Spammers have hit my blogs on multiple occassions recently, and I delete the spam comments as soon as I become aware of them. But automating the process to some degree would certainly allow me to relax more. Fortunately, blogs are fundamentally different from usenet and email in several ways, and I think that these differences may allow us to stop this spam problem before it really gets bad. There’s still a chance for us to kill this weed at the root, before it has a chance to grow.

Why do spammers post comments on blogs? Do they really expect blog owners to leave their comments up for people to read? Do they expect people to follow their links to the spammers’ sites selling viagra and porn? Not really. What they are really after is search engine ranking. Because the more links to their sites that a search engine sees, the better the chance that their site will come up in a list of search results. Even if you’ve deleted the original comment, it won’t matter, if a search engine has already indexed the links in the comment before you deleted it. So, one of our anti-spam tactics should be to bust up the search engine ranking for spammers links.

And once a spammer has given us the link to his page and we see it for what it is, we can store that information in a blacklist, and use it to block future messages that try to link to that site. And if we can make a common, distributed, authenticated way to share such blacklist info, we can be pretty effective at staunching the flow of the spam. And if we can team up now, while this problem is still young, we can basically create an environment where comment spam just isn’t cost-effective for the spammers.

Blogrolling Hack Illustrates Need for Decentralization

Posted by Matt Mullenweg. Filed under Development.

This morning it seems that sites who manage their blogrolls using blogrolling.com’s service had their links hijacked, every link being replaced by one to “Laura’s Blog” which predictably redirects to a porn site. As painful and unfortunate as this is, I think it illustrates an important point that as a weblogging community we should be heading away from centralization as a rule, not flocking to every free or low-cost centralized service that pops up.

To me one of the greatest things about weblogs is that they shift power and control away from monolithic organizations and into the hands of users, where it is ultimately more secure. I have a friend who lost three years of her writing when a free online journal service decided to fold and delete everyone’s entries. I know people who hardly use email because their hotmail or yahoo addresses are flooded with so much spam as to make them useless. People who don’t host their own comments have their discussion at the mercy of some third party provider of varying reliability. Many of you reading this had your blogrolls hijacked this morning. In the weblog world blogroll links represent a web of trust — you freely giving a piece of your credibility to another site as a gift to that site and your audience. Today that trust was betrayed for many people.

This isn’t meant to criticize the fine people behind blogrolling.com at all. Realistically, anyone can be hacked and most people have been at some point. However the principle of the matter is that this shouldn’t have been a problem in the first place; it shouldn’t have rocked the weblog world like it did. How to change? Host your blogroll yourself. This is why WordPress’ links feature offers weblogs.com XML support, an unlimited number of blogrolls and links, OPML import (so you don’t have to re-enter all your links), and a handy bookmarklet — all for free. Even if you don’t use WordPress, please at least consider moving to a decentralized method of managing your blogroll.

Save as Draft button

Posted November 6, 2003 by Matt Mullenweg. Filed under Development.

As a first step in making the administration and posting interfaces as elegant as possible, I’ve started implementing some of the ideas I’ve had as a very frequent user of WordPress and those that came up from a discussion with Matthew Thomas. Last night I checked in the code that is essentially a “save” button for your online editing. Now the default button for the post form instead of posting something directly to your blog stores it as a draft in the database and takes you back to editing it. I know when I write long entries online in the browser I’m always paranoid, because I have lost too many words to the hostile enviroment of browser editing. However with the new WordPress quicktag code, I’m more comfortable writing on the post screen than just about anywhere else, so what I would do was every couple of minutes set the post status to draft, submit the post, and then click back to edit the post and continue working. This eliminates steps, will prevent accidentally postings (and pingings), and hopefully save a few posts.

New Forum RSS feeds

Posted November 5, 2003 by Matt Mullenweg. Filed under Development, Meta.

Alex hacked the RSS functionality I wrote for the support forums and added a feed that shows the last 20 posts in any forum. Also as always you can get an RSS representation of a thread by adding /rss/ to the end of the URL or following the link at the top of every thread. Enjoy!

Built In Statistics

Posted November 4, 2003 by Matt Mullenweg. Filed under Development.

There is some good discussion starting in the forums with regard to including some statistics functionality in WordPress. I think this is a very useful idea that a lot of people would benefit from. Feel free to hop in the discussion if you have any ideas regarding this.

There have been some more improvements to the new calendar code, including tweaking the style a bit (hat tip: Dunstan) and integrating Alex’s tooltip code back in. There are probably going to be some more tweaks over the next day or so getting the code as fine-tuned and efficent as possible. After that’s done I’m going to look into taking it to the next step so a similar function could generate a calendar view similar in spirit and function to what Mark Pilgrim uses. If you have any ideas regarding this feel free to chime in on the forums.

Comments Closed

Posted by Matt Mullenweg. Filed under Meta.

After some discussion we’ve decided that the signal to noise ratio in comments on this blog tends to be high and much of the discussion that takes place here really should happen in the support forums, which are specifically for that purpose. Old comments will be preserved, but new comments on all entries have been closed and new entries will have comments off except in certain circumstances. Trackbacks and pingbacks will remain on though, so feel free to provide feedback through those mechanisms.

New Calendar Code, Updates

Posted by Matt Mullenweg. Filed under Development.

The legacy calendar code we had was a little messy so over the past day or so I recoded it from scratch, making it easier to understand and cleaner. The XHTML of the calendar itself is semantically richer table markup, inspired in part by Dunstan Orchard and with some guidance from Joe Clark. You now also call the calendar as a template tag an not an include. The CSS for the calendar has been revamped as well in the default template, which you can see in action on the test blog.

In other news the new permalink structure is being implemented throughout the code, which is actually cleaning up a lot of things by moving redundant code into the new functions. There have also been several bugfixes related to that code. Discussion has renewed on implementing Smarty templates, and the best way to go about that. Finally we’ve gotten some great usability input from Matthew Thomas which will most certainly make its way into the next release’s administration interface.

See Also:

Want to follow the code? There’s a development P2 blog and you can track active development in the Trac timeline that often has 20–30 updates per day.

Want to find an event near you? Check out the WordCamp schedule and find your local Meetup group!

For more WordPress news, check out the WordPress Planet or subscribe to the WP Briefing podcast.


Subscribe to WordPress News

Join 1,931,375 other subscribers


%d bloggers like this: