MSN bot (and to a smaller extent, google bot) have been crawling my site for the last week every day between 10AM and 3PM PST. The # of requests coming from MSN bot makes the # of requests to my site spike 300% - 400% and my server cannot handle it and it'll eventually cause me to get a lot of intermittent ERROR CONNECTING TO DATABASE until after it stops hammering my site. Additionally, a lot of the requests are trying to hit an erroneous URL (a link to a post with an extra " at the end of the URL which is invalid) which is causing a bunch of 404 errors in my logs.
I guess my questions at this time are:
- Does anyone have any ideas about or solutions to why MSN is crawling like this, and if there's anything I can do to alleviate the problem? Any similar experiences?
- Where does this erroneous URL come from? Does MSN / Bing use some kind of sitemap file where they might be getting a bad URL?
So far, I've tried adding a "crawl-delay: 1" parameter to my robots.txt and it hasn't seemed to help much (at least not yet).
Does anyone have any similar experiences or any advice?