WordPress.org

Forums

Robots.txt ISSUE (4 posts)

  1. SammyJayJay
    Member
    Posted 2 years ago #

    Hello,

    I have a problem with Google not being able to access my robots.txt.

    This is the message i get when testing my submitted sitemap:

    Network unreachable: robots.txt unreachable. We were unable to crawl your Sitemap because we found a robots.txt file at the root of your site but were unable to download it. Please ensure that it is accessible or remove it completely.

    Ok.. http://cheapautoinsuranceinfo.net My site is just a crappy little test site..but I need to resolve this issue so I understand it for any other time it might occur to an important site.

    I have google xml sitemap installed... set with option to create virtual robots.txt turned OFF.

    wordpress settings privacy set to allow search engines.

    permalinks set to "post name"

    I have a robots.txt file in my root domain
    http://cheapautoinsuranceinfo.net/robots.txt

    This file seems fine to me, it opens perfectly.
    websniffer.net shows no problems.

    I can not see any thing wrong with .htaccess file either...

    YET Google still can not access it.

    Another weird thing I noticed is that the robots.txt file will open like normal no matter how many slashes are in the url.. for example: http://cheapautoinsuranceinfo.net/////robots.txt opens fine.. surely that is supposed to result in a 404...

    Can anybody please help?

  2. Pioneer Valley Web Design
    Member
    Posted 2 years ago #

    You're robots.txt file is basically blocking everything but the uploads folder.

    Try simply:

    User-agent: *
    Allow: /
    Sitemap: http://cheapautoinsuranceinfo.net/sitemap.xml

    A) Unless your sitemap is huge (it's not), no need to use the .gz compressed file.
    B) Robots.txt files are 'suggestions', that is the bots will crawl your sites anyway unless blocked by some other method, BUT, when you specifically use Google Webmaster Tools to look at your site, Google will report an error based on your robots.txt file.

  3. SammyJayJay
    Member
    Posted 2 years ago #

    Google is not accessing my site at all. The robots.txt was not even able to be read by google. This has stopped the googlebot from even crawling my site. I have changed the robots.txt to the simple format you suggested, thanks for that. However my problem still exists....

    the problem is not whats in the robots.txt, the problem is google is not even able to read it.

    Please help somebody?.. how can I resolve this problem?

    Below is the exact message from Google... and after checking everything I said above including checking with my server host, nothing seems to have worked and the fetch as google tool is still unable to access my robots.txt file or any url from my site.

    http://cheapautoinsuranceinfo.net/: Googlebot can't access your site
    July 20, 2012

    Over the last 24 hours, Googlebot encountered 61 errors while attempting to access your robots.txt. To ensure that we didn't crawl any pages listed in that file, we postponed our crawl. Your site's overall robots.txt error rate is 72.6%.

    You can see more details about these errors in Webmaster Tools.

    Recommended action
    If the site error rate is 100%:

    Using a web browser, attempt to access http://cheapautoinsuranceinfo.net//robots.txt. If you are able to access it from your browser, then your site may be configured to deny access to googlebot. Check the configuration of your firewall and site to ensure that you are not denying access to googlebot.
    If your robots.txt is a static page, verify that your web service has proper permissions to access the file.
    If your robots.txt is dynamically generated, verify that the scripts that generate the robots.txt are properly configured and have permission to run. Check the logs for your website to see if your scripts are failing, and if so attempt to diagnose the cause of the failure.

    If the site error rate is less than 100%:

    Using Webmaster Tools, find a day with a high error rate and examine the logs for your web server for that day. Look for errors accessing robots.txt in the logs for that day and fix the causes of those errors.
    The most likely explanation is that your site is overloaded. Contact your hosting provider and discuss reconfiguring your web server or adding more resources to your website.

    After you think you've fixed the problem, use Fetch as Google to fetch http://cheapautoinsuranceinfo.net//robots.txt to verify that Googlebot can properly access your site.

    as you can see from my first message, G can find it but they can not read it.

    ... and what's with the url with so many slashes not retuning a 404 like it should be....

    PLEASE HELP

  4. santy143all
    Member
    Posted 2 years ago #

    I am unable to access my robots.txt file its not showing in my wordpress folder that i used to upload on net. Does they access my root folder.

Topic Closed

This topic has been closed to new replies.

About this Topic