So this isn't so much a support request as just me trying to get something out there about what's happening to me. Based on a lot of research and logriding the only conclusion I can come to is that the site I manage is being attacked by some kind of botnet and the attack consists of flooding WP with requests for every month archive ever until the server resources are totally overloaded and need to be restarted.
I have two 8-core dedicated servers, one for apache and one for mysql, running the site. We get ~11k hits per day normally and the load during normal traffic is around 1.5-3 (the db server has really low load most of the time but spikes during updates and stuff to around 4, that's out of 8 cores though, so on some level that's a load of 0.5)
We have 49,878 posts in the db, which means that any kind of non-cached pageload is incredibly slow and intense, and that these bots, showing up and asking for every /2008/05 type url all at once brings it to its knees. I have monit ( http://mmonit.com/monit/ ) installed on the server, so when the load spikes up to 15 it restarts apache, which stops the whole thing from going down for an hour, but also causes tons of insane behavior (lost posts, murdered pageloads etc).
This has been going on for months, I've been working around it with Monit and other optimizations but at the core my problem is that every 5 minutes or so a DIFFERENT computer comes by and loads all the pages at once. I've been tracking the IP's and they seem to change each time it happens as well as being in totally different parts of the world and in different kinds of organizations. My suspicion is that they are just infected Windows machines.
Here are a couple of examples from my apache logs. Note that in all cases I get 20-30 similar requests for different months all at once from the same IP/user-agent:
126.96.36.199 - - [23/Mar/2009:12:59:09 -0400] "GET /2008/03/ HTTP/1.1" 200 128651 "-" "Mozilla/4.0 (compatible;)" 188.8.131.52 - - [23/Mar/2009:15:06:29 -0400] "GET /2009/02/ HTTP/1.1" 200 126646 "-" "Mozilla/4.0 (compatible;)" 184.108.40.206 - - [23/Mar/2009:13:38:14 -0400] "GET /2004/12/ HTTP/1.1" 200 107751 "http://globalvoicesonline.org/2008/02/25/barbados-carbon-footprint/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; InfoPath.1)" 220.127.116.11 - - [23/Mar/2009:17:14:22 -0400] "GET /2008/06/ HTTP/1.1" 200 125667 "http://globalvoicesonline.org/2009/02/14/barbados-trinidad-tobago-clico-questions/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; InfoPath.1; .NET CLR 2.0.50727; .NET CLR 1.1.4322; .NET CLR 3.0.04506.30)" 18.104.22.168 - - [24/Mar/2009:12:13:39 -0400] "GET /2007/02/ HTTP/1.1" 200 120982 "http://globalvoicesonline.org/2009/01/21/japan-coming-of-age-in-2009/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)" 22.214.171.124 - - [23/Mar/2009:17:14:30 -0400] "GET /2008/12/ HTTP/1.1" 200 122549 "-" "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)"
One of the weirdest things is how they mostly have the "Mozilla/4.0 (compatible;)" user-agent, but in a lot of cases they have all kinds of weird mixes of MSIE6, InfoPath etc. It seems like maybe this is just because whatever spyware is coordinating the attack is using the default browser on the owned box, thus registering whatever user-agent the box would normally give when someone was browsing.
Given the patterns though, I don't see how it could possbily be real traffic, and I don't see any reasons to think that it could be a real search engine crawler or anything, especially given how many different places its coming from.
SO: Has anyone seen anything like this before? Any ideas?
I'm going to work on apache/.htaccess methods for curtailing this but its really hard because it looks so much like real traffic. I'll probably be spending some time with mod_evasive to see if that can help.