WordPress.org

Support

Support » How-To and Troubleshooting » site overwhlemed (shut-down) by spiders and feeders

site overwhlemed (shut-down) by spiders and feeders

  • I have four sites, each with their own installed code base, on one hosting account. My sites go down, all at the same time, from time to time. This last time my host was able to tell me “your site was aggressively retrieved by several indexing/searching sites at one time, Yahoo! DE Slurp, Yahoo! Slurp, msnbot, msnbot-media, Sphere Scout, Feedfetcher-Google, Baiduspider+, Moreoverbot, ScoutJet, Googlebot”

    I would restart the server and it would immediately go down again due to a limit of 15 consecutive connection limit to mysql.

    I see my traffic using google analytics and do not have much traffic at all so I am at a loss about how to prevent this or solve this problem.

    I don’t know which pages they are looking at/indexing.

    I read somewhere that you can offload your rss feed to feedburner but I don’t know if that would solve this issue or not.

    Any ideas are much appreciated!

Viewing 15 replies - 1 through 15 (of 19 total)
  • Umm, get better hosting. 15 connections is a really, really small number.

    and 4 sites is too many on one host.

    WP-Super-Cache. Why hit the database at all?

    4 sites is not really a lot for one host unless those 4 sites have each more than 1K unique per day on them.

    whooami

    @whooami

    Member

    4 sites is not really a lot for one host unless those 4 sites have each more than 1K unique per day on them

    huh? assuming you mean host == server 4k uniques isnt that much for a well equipped box.

    The thing is that the OP is probably on shared hosting, and it’s oversold, of course.

    Traffic is traffic, regardless of whats behind the IP, and like was pointed out, 15 consecutive mysql connections is nothing.

    and even if host == one hosting acct serving multiple domains: 4000 uniques if you are within any bandwidth limitations …its the bandwidth limitation thats important. 30G a month is just about right for that IF there isnt a lot of media being served.

    @whooami ops my bad, I mean using the same account on the shared server. I have to agree on the oversold part, it is definitely oversold.

    whooami

    @whooami

    Member

    🙂 I think we all agree, time for this person to get a better host. Tell them to wank off, ask for your money back, and move on, webdev2 🙂

    Or use wp-supercache, I s’pose.


    But thats probably a band-aid on a larger problem.

    thanks Otto. I use WP-Cache Manager now. Is WP-Super-Cache better?

    WP-Super-Cache is far better. Especially if you don’t have most readers logging into your site.

    Also, WP-Super-Cache incorporates the older WP-Cache functionality right into it. WP-Cache is no longer supported, I believe.

    Many thanks Otto!

    Otto – Under:
    Rejected User Agents

    Do I need to lease these or remove them if I want to still be found and indexed by everyone who wants:
    bot
    ia_archive
    slurp
    crawl
    spider

    I would suggest following robots.txt as well ..

    Then parse your logs, any one that isn’t following it route them to local host.

    palamedes – I appreciate your help but I have no idea what that means.

    Ah sorry..

    http://www.robotstxt.org/robotstxt.html

    The robots.txt file is a file you can put on your system that will instruct the various web search bots out there what they can and can’t search. More over you can put in a time limit on there that says “only search this often”. (google doesn’t listen to it, but you can log into their site and control it from there)

    The robots.txt file on my site looks like this;

    User-agent: *
    Crawl-delay: 240
    Disallow: /mint/
    Disallow: /uploads/
    Disallow: /trap/

    The lines that are important for you are the Crawl-delay and the Disallow. Basically it tells robots to not crawl anything in the mint, uploads or trap directory and only to crawl my site every 240 seconds.

    The crawl-delay will do a lot to help keep the bots that follow robots.txt at bay.

    ANY robot that doesn’t follow that file, or falls into the trap of going to a directly specifically disallowed will show up in my logs. My log scrapper will then route them to local host:

    route add -host {incoming.annoying.bots.ip} gw 127.0.0.1

    What this does is it basically tells their bot to search themselves. Usually a bot will hang here .. opening up their tcp port until it times out.. (5 minutes or so).. and costs you nothing ~ its a way to slow’em down or at the very least to give them a “go away” message.

    Search your logs for any IP that is pounding on your site and route it..

    whooami

    @whooami

    Member

    I prefer:

    ip rule add blackhole from 123.456.789.012

    where the IP matches icky IP

Viewing 15 replies - 1 through 15 (of 19 total)
  • The topic ‘site overwhlemed (shut-down) by spiders and feeders’ is closed to new replies.
Skip to toolbar