Support » Fixing WordPress » Lots of "/?s=" in log files

  • In the last few days I noticed that a number of different IP addresses are frequently hitting my WordPress site at “/?s=” and “/?p=741”. The first URL is obviously related to WordPress’ search facility, but because the search is empty, it just presents the site as if my browser were accessing the root of the site. The second is a valid post I made in May 2010. The “/?s=” hits are significantly more frequent than the “/p=741” hits.

    Looking back through my logs it appears as though the hits are coming from numerous IP addresses all over the world, though most of them are from hosting companies that often offer VPS servers or web hosting. In the past the User-Agent strings appeared to be randomized and included numerous versions of different browsers. In the last few days the User-Agent stings seem to be limited to the following two

    – “Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:21.0) Gecko/20100101 Firefox/21.0”
    – “Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:22.0) Gecko/20100101 Firefox/22.0”

    Since my site is generally not Mac-related, I started investigating the hits. I am seeing them as far back as March of this year but as I go further and further back in time the hits become less frequent and blend in more and more with the other regular traffic.

    I have had a few name-based virtual hosts on this server for quite some time now and the hits are only on one of them (coincidentally the virtual host that is longest running and also on https). Interestingly, it is not the default virtual host.

    Does anybody have any idea what this traffic is and whether it may be maliciously intended?

Viewing 5 replies - 1 through 5 (of 5 total)
  • I can’t say for sure with the information you have provided, but the given that the links you’ve listed are actual pages on your site, (and no apparent url manipulation for sql injection being passed), it doesn’t look malicious. It is possible they are bot (which could be search engines, etc.) related. Can you post the complete, relevant lines from your log?.

    Is “/p=741” still a page on your site? How many of these hits are you seeing? And out of how many hits in general.

    If it really worries you, you can always install the Block Bad Queries (BBQ) plugin. I use it on all my WP sites.

    Hey Doug,

    Here’s a pastebin of my traffic since noon today.

    http://pastebin.ca/2456446

    I should note that this Apache access.log specifically excludes traffic from my browser and it excludes most common search engines (though you will see that Wayback and mail.ru bots are both there). Of the 283 entries since noon, 197 of them were to /?s= and 63 were to /?p=741, which means that 91.8% of my traffic in the last 6.5 hours has been these URLs. To get an idea of scale, my searchengine log file has only 8 entries in the same time period.

    The reason I don’t think they are legitimate search engine bots is because the IP addresses that are hitting these URLs are hitting no other pages on my site. The /?p=741 URL is a legitimate post of mine though it is only one short paragraph and has two links (one of which is a dead local link, after having been relocated a long time ago).

    Unfortunately I don’t think that BBQ will be effective in treating these requests since both URLs are valid (even if the /?s= request is redundant). I started compiling a list of IP’s which has since turned in to a list of the network segments they are part of, with the intention of perhaps blocking them at my firewall. I didn’t think this was going to be an issue, because they were all coming from IP ranges owned by VPS provider companies like webexxpurts.com and ovh.com – but I found one that is listed as att.net which sounds like it could contain actual customer IP’s.

    I’m mildly concerned about it being a virus that other people might be infected with, but frankly if it is a virus then it appears to be mostly harmless (other than irritating traffic levels).

    Any thoughts?

    Thanks,
    Snork.

    BBQ would be used to prevent actual malicious requests, it obviously wouldn’t catch these because they’re valid and not malicious.

    With the big, big caveat that I’m not a security expert (but I do take security seriously, and my current workplace of the last 10 years involves working on sites with 60-80 million pages views a month, we’ve seen just about everything :), I don’t see anything to worry about. The how and why of these requests is probably impossible to know. It’s obviously very strange, but I’m not sure what you can do other than block IPs. I’m also not sure what you’d gain by blocking valid (though unusual) requests. Anyway, perhaps someone else will chime in.

    Interestingly, last night I started getting hits on /?paged=2 from 113.64.81.138 with a TencentTraveler user-agent… then this morning it has switched to a few different IE agent strings. I have had 35 hits on it from that IP since midnight (about 8.5 hours) and can’t imagine that someone is really that interested in the posts I made from late June to late July but is not interested in hitting any other pages on my site. Also, that I do not have a single hit on /?paged=2 from any other IP address for the last few days. 🙁

    TencentTraveler is actually a legitimate UserAgent (apparently most popular browser in China), but more than likely a bot/spider/crawler is spoofing it (as well as other popular UAs). You can use htaccess if you want to block UAs or IP blocks. Someone mentioned they just all IPs from China. You may find yourself playing “whack-a-mole,” but if it concerns you, you can ban IPs/UAs as you have issues.

    The important thing is to make sure your site is hardened so actual malicious attacks can do no harm. I generally don’t worry much about non-malicious behavior unless we’re getting hammered and it effects server performance. We did get hit hard a few weeks ago from some bots, but they were attempting every known vulnerability under the sun (including many mailicious URLs). No damage done, as our sites are heavily locked down as far as security goes, but we did block the IPs because the traffic was excessive.

Viewing 5 replies - 1 through 5 (of 5 total)
  • The topic ‘Lots of "/?s=" in log files’ is closed to new replies.