• yaksox

    (@yaksox)


    Hi,

    I recently had a server issue with my site where I have a wordpress blog. I was getting 510 errors and the site was reachable only intermittently.

    It’s a great place where I have it hosted, and after emailing, they fixed it. But it was interesting what they said had been causing the problem. I’ll quoter the email:

    This may be due to the additional restrictions we’ve placed on your website following an investigation into server reliability which indicated that periodic fetching of your entire blog archive by numerous hosts was in fact consuming most of the RAM on the server, resulting in unavailability of most services on the server for periods of up to 10 minutes.

    I’ve increased the number of simultaneous connections allowed to your Virtualhost – Please let me know if the issue persists, however I believe now that I’ve more than doubled the limit of simultaneous connections, the issue should be resolved.

    My question is: is there anything I can do to stop bots or spiders or whatever from indexing (downloading) the whole site so often.
    I’ve been running a blog (now all imported into WP) for nearly 10 years, so the archives are fairly large.

    The answer might not even be a WP thing, but more of an HTML thing. Like, is there a some robots text code that will limit bots to scanning the archive once every couple of months? I don’t want the archive to drop off the radar completely by using NO FOLLOW, but the archive hardly ever changes, so doesn’t need to be scanned that often.

    I’m wanting to do the right thing by my webhost and not needlessly chewing bandwidth.

    thanks for any advice,

    YS

Viewing 3 replies - 1 through 3 (of 3 total)
  • katyakarski

    (@katyakarski)

    this won’t work for every spider, but google has an option for webmasters called the google sitemap – there are plugins to make this super easy.. it allows you to set the relative importance of certain areas on your blog, and “suggest” a crawling frequency to google.

    robots text is a bit more brutal, but this should at least help google calm down a little (and it is by far the most aggressive of all the crawlers)

    Thread Starter yaksox

    (@yaksox)

    Thanks Katya. 🙂
    I used the google xml sitemaps plugin.
    http://wordpress.org/extend/plugins/google-sitemap-generator/
    Hopefully it makes a difference.

    katyakarski

    (@katyakarski)

    That’s the one I use as well. It’s a great plugin.

Viewing 3 replies - 1 through 3 (of 3 total)
  • The topic ‘limit indexing of archives by bots’ is closed to new replies.