WordPress.org

Ready to get started?Download WordPress

Forums

W3 Total Cache
[resolved] 403 Crawl Error on Google - All pages have been removed from google index (5 posts)

  1. kylen20
    Member
    Posted 2 years ago #

    Hi,

    I've just discovered that all of my webpages for all 7 of my website have been removed from Google. So am now panicking somewhat as my business is somewhat dependent on being able to be found online.

    There seem to have been 403 crawl errors since some time in June. Therefore all of my sites have been removed from Google with the exception of the home pages. The error listed in Google Webmaster Tools is:

    URLs not accessible
    When we tested a sample of the URLs from your Sitemap, we found that some URLs were not accessible to Googlebot due to an HTTP status error. All accessible URLs will still be submitted.

    The only thing that has changed is shifting to Cloudflare early in June using W3 Total Cache. Please get back to me asap and advise how to fix this. The other thing that I noticed is my websites all now appear twice in the Webmaster tools listing. Once with the www prefix and again without it.

    The domains are:

    http://www.kyle-newman.com
    http://www.kyle-newman.tv
    http://www.movebeyond.net
    http://www.movebeyond.tv
    http://www.beyondhealth.net
    http://www.adventurebeyond.net
    http://www.beyondtheleadingedge.com

    Thanks! Kyle

  2. Frederick Townes
    Member
    Plugin Author

    Posted 2 years ago #

    It's hard to guess why this has happened because how you have W3TC and Clouflare configured are complete unknowns. What are the errors in your error log? Have you spoken to cloudflare?

    Please get back to me asap and advise how to fix this. The other thing that I noticed is my websites all now appear twice in the Webmaster tools listing. Once with the www prefix and again without it.

    That's normal and has nothing to do with the issue here.

  3. kylen20
    Member
    Posted 2 years ago #

    I have posted the same query on Cloudflare. They have come back to me and their first suggestion is to add their IP addresses as an Allow on the .htaccess file. So I'm doing that now.

    This is a copy of the error message:

    URLs not accessible
    When we tested a sample of the URLs from your Sitemap, we found that some URLs were not accessible to Googlebot due to an HTTP status error. All accessible URLs will still be submitted.

    In the diagnostics section all of the URL errors are the same as this one. For every page on my sites:

    http://www.movebeyond.net/category/know-no-limits/
    403 error Jul 24, 2011

    Let me know what other information you need. Thanks!

  4. kylen20
    Member
    Posted 2 years ago #

    I've just had my web-hosting company look into this to check that it's not a problem on their server and they have suggested that this line in the .htaccess file could be the problem:

    RewriteCond %{HTTP_COOKIE} w3tc_referrer=.*(google\.com|yahoo\.com|bing\.com|ask\.com|msn\.com) [NC]
    RewriteRule .* - [E=W3TC_REF:_search_engines]

    Please advise asap. Thanks!

  5. kylen20
    Member
    Posted 2 years ago #

    Right, I think I may have solved the problem. I've just tried something suggested by my hosting company and using a Firefox plugin called 'user agent switcher' to test my site as googlebot and discovered that I was blocked from doing so by the Bad Behavior Plugin (http://wordpress.org/extend/plugins/bad-behavior/) that's installed as part of the security on my site.

    It would appear that because the Googlebot request was being forwarded from Cloudflare that the plugin was blocking it. This was the error message I received:

    Error 403

    We're sorry, but we could not fulfill your request for / on this server.

    An invalid request was received. You claimed to be a major search engine, but you do not appear to actually be a major search engine.

    Your technical support key is: adf5-3158-f118-2195

    You can use this key to fix this problem yourself.

    If you are unable to fix the problem yourself, please contact kyle at movebeyond.net and be sure to provide the technical support key shown above.

    So I then checked the error log and found a whole host of blocked requests, eg:

    199.27.128.138
    cf-199-27-128-138.cloudflare.com

    2011-07-27 10:51:52

    User-Agent claimed to be Googlebot, claim appears to be false.

    GET /robots.txt HTTP/1.0
    Accept: text/plain,text/html
    Accept-Encoding: gzip
    Cf-Connecting-Ip: 66.249.67.215
    Cf-Ipcountry: US
    Connection: close
    From: googlebot(at)googlebot.com
    Host: http://www.movebeyond.net
    User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
    X-Forwarded-For: 66.249.67.215

    I'm assuming that this is the problem and have deactivated that plugin for now. Given the way Cloudflare works to filter traffic, do I still need this plugin to block spambots.

    Thanks again for your prompt response.

    Kyle

Topic Closed

This topic has been closed to new replies.

About this Plugin

About this Topic