WordPress.org

Ready to get started?Download WordPress

Forums

.htaccess & robots.txt problem (24 posts)

  1. thetexasreport
    Member
    Posted 5 years ago #

    Hello,

    I will work on making this descriptive so my chances of getting it resolved are higher.

    I've been trying to resolve this for hours now -

    * I attempted to submit my sitemap to Google - however they came back and said it is being blocked by my robots.txt. There is no robots.txt on my server.

    * But, I called up my host (GoDaddy) for the 3rd time and told them this. They had guys looking at it for about 45 minutes. The only thing they could derive from this issue is that my .htaccess is causing the robots.txt to write automatically. Therfore blocking Google access to my site.

    The following is the URL of my sitemap (which you probably won't be able to access: http://www.thetexasreport(dot)com.

    Remember, I do not have a visible robots.txt file anywhere on my server. I've went through every folder.

    Can someone please help?

    Thanks in advance,

    Joel

  2. thetexasreport
    Member
    Posted 5 years ago #

    Just for clarification, the entire URL for my sitemap is. thetexasreport(dot)com/sitemap.xml
    Also I used the Arne Brachhold Google sitemap generator.

  3. thetexasreport
    Member
    Posted 5 years ago #

    anyone have any thoughts on this by chance?

  4. iridiax
    Member
    Posted 5 years ago #

    The only thing they could derive from this issue is that my .htaccess is causing the robots.txt to write automatically.

    Huh???

    I attempted to submit my sitemap to Google - however they came back and said it is being blocked by my robots.txt. There is no robots.txt on my server.

    Are you sure that it is not actually a "robots.txt unreachable" error?

  5. whooami
    Member
    Posted 5 years ago #

    http://www.thetexasreport.com/robots.txt

    there's either one there, and you missed it, or youve got a setting or a plugin, during the work.

  6. whooami
    Member
    Posted 5 years ago #

    gee, I refreshed it, and the text changed. who woulda thunk.

  7. thetexasreport
    Member
    Posted 5 years ago #

    This is the error - I am getting from google.

    URL restricted by robots.txt
    We encountered an error while trying to access your Sitemap. Please ensure your Sitemap follows our guidelines and can be accessed at the location you provided and then resubmit

    Also, why would the path to my robots.txt be valid when there is NONE showing on my server.

    http://www.thetexasreport.com/robots.txt

    The support repts at GoDaddy (after 45 minutes of studying it) said it was a problem with the .htacess that is automatically generating the .robots.txt.

    This is insane.

  8. thetexasreport
    Member
    Posted 5 years ago #

    Well, I tried makng one minor change to the robots.txt. . . that's why the text changed.

    I put it back to default so I don't confuse anyone.

  9. thetexasreport
    Member
    Posted 5 years ago #

    Let me reprase that. . . I uploaded a robots.txt to triy and see if I could overwrite one not showing on the server.

    This didn't help.

  10. whooami
    Member
    Posted 5 years ago #

    is there something I am missing or are you just not getting this:

    you said this, in your first post:

    Remember, I do not have a visible robots.txt file anywhere on my server. I've went through every folder.

    Ive just showed you, that you do. And I indicated watching the text change on a reload of the page.

    Then, you say this:

    Well, I tried makng one minor change to the robots.txt. . . that's why the text changed.

    I put it back to default so I don't confuse anyone.

    Do you not see the contradictory statements here?

    What are you trying to do?

    If you want google to not be blocked, remove that file. OR -- fix it so googlbot has access. You made that change once, you said it above, and I saw it. But then you changed it back so not to confuse anyone??

    If you uploaded a robots.txt ..

    Check that the version of wordpress you are using does not have the privacy option ticked on. Next is to look at your plugins, particularly SEO plugins.

    Like I said earlier, theres only 3 potential causes of this: a robots.txt existing on your server, the wordpress privacy option, or a plugin setting.

  11. thetexasreport
    Member
    Posted 5 years ago #

    # 1 - I know the robots.txt file can be viewed. I saw the text too.

    However, I cannot find the file ANYWHERE on my server.

    # 2 - After I made the change to the NEW robots.txt file I uploaded it just to see if I could overwrite one I was missing - this did not fix the problem. I still got the same error from Google after I re-submitted the sitemap.

    Not trying to make this confusing.

    What am I trying to do? I want to remove the robots.txt file from my server so I can submit my sitemap.

    However, GoDaddy even looked for it for 45 minutes and they could not find it. Even though we could view the text.

    This is why it's so strange.

  12. whooami
    Member
    Posted 5 years ago #

    its not strange.

    WP has a privacy setting. Ive mentioned it 3x times now.

    Have you checked that?

    Have you checked all of your plugin settings?

  13. thetexasreport
    Member
    Posted 5 years ago #

    Yes, I have checked the privacy settings.

    I have checked the plugin settings to the best of my ability. . . not really sure what I am looking for on these.

    I think it's strange because my host support looked at it for 45 minutes and they were baffled. And, I've mentioned it multiple times as well.

    I'm not an IT guy, so I'm really trying to dig in and see what's going on here.

  14. thetexasreport
    Member
    Posted 5 years ago #

    This is still extremely confusing whats going on here.

    Anyone want to give this a shot?

  15. Samuel Wood (Otto)
    Tech Ninja
    Posted 5 years ago #

    Look, this is not that complicated.

    The robots.txt is being generated by WordPress. It is not a file on the server, WordPress is creating it on the fly, based on your settings on the Settings->Privacy page.

    Because you have set it to be not private, the robots.txt you currently have contains this:
    User-agent: *
    Disallow:

    This robots.txt basically says "Allow everything". Google will not block because of this. However, Google may have your old robots.txt cached, and will need to refresh before you can do anything with it. If at any given point in the past you had it set to "Private", then Google may have noticed it then and will need time before it gets around to rechecking your site. Until that happens, you can't do anything.

    In other words, you don't need to *do* anything. Your site is now correct. Google needs to notice that, and you need to wait for it to do so.

  16. thetexasreport
    Member
    Posted 5 years ago #

    Thank You Otto42 for giving me a straight answer.

    I understand.

  17. Jimmi_Henderson
    Member
    Posted 5 years ago #

    Lol..I have Been reading searching testing and tweeking my blog for a week
    trying to get this same problem resolved... I just checked the privacy settings and apparently it was set to block all search engines... lol
    Just resubmitted and I have a feeling that it will take since my robots text is actually allowing the / directory... lol what a f****ing joke..
    ***The thing is I am using the xml site map plugin ... I am used to building them old school and tossing it into one of the directories...
    worked great in the past but how can I do that with a blog seems that content being dynamic and all I would have to refresh it every three days or so which doesn't seem efficient but at the same time these plugins seem to be leaking "link Juice" with all of the out bound links in them.... Any Thoughts

  18. Jimmi_Henderson
    Member
    Posted 5 years ago #

    Lol..I have Been reading searching testing and tweeking my blog for a week
    trying to get this same problem resolved... I just checked the privacy settings and apparently it was set to block all search engines... lol
    Just resubmitted and I have a feeling that it will take since my robots text is actually allowing the / directory... lol what a f****ing joke..
    ***The thing is I am using the xml site map plugin ... I am used to building them old school and tossing it into one of the directories...
    worked great in the past but how can I do that with a blog seems that content being dynamic and all I would have to refresh it every three days or so which doesn't seem efficient but at the same time these plugins seem to be leaking "link Juice" with all of the out bound links in them.... Any Thoughts

  19. Jimmi_Henderson
    Member
    Posted 5 years ago #

    In other words, you don't need to *do* anything. Your site is now correct. Google needs to notice that, and you need to wait for it to do so.

    I see what your saying here but in webmaster tools you should be able to resubmit the site map and then not get the error again.... Google says it may take a few hours but every time I submit it takes 20 minutes or so....
    therefore It would suck to wait around on the problem a few days and not have it resolved... Lol, when ever my host doesn't know how to fix the problem they always tell me to take an action and then wait... lol... It waiting has yet to fix any of the problems.

  20. raeph
    Member
    Posted 5 years ago #

    Uff! that thread saved me from getting completely crazy...
    thanks for clarification Otto42!

  21. moongoose
    Member
    Posted 5 years ago #

    I am having a similar problem with a client's site that was indexed fine by Google and suddenly is not being indexed. I tried submitting a sitemap after the problem started, and I received the error:

    Network unreachable: robots.txt unreachable

    However, the robots.txt file that is being autogeneratd by WP is fine -- nothing is blocked.

    I suspect that my web host may be blocking some of Google's IP addresses. Here is a post I found related to this:

    Besides a problem at Google's end, this issue is most commonly caused by the fact that the hosts are blocking one or more of the Google's IP addresses. This is why I advise you to contact your host and ask them to check whether they are blocking any Google's IP addresses. You can find a list with these IPs at the following URLs:

    http://www.webmasterworld.com/forum24/517.htm

    and

    http://www.phpbb-seo.com/en/seo-prin...cle-t2169.html

  22. tapaninaho
    Member
    Posted 5 years ago #

    hey all,

    Any idea how long it might take for robots.txt status to change in googles cache?

    I changed the privacy settings about 15 hours ago and Google still thinks it's been blocked. I would've thought they'd check the robots.txt status as you submit the sitemap.

    fyi, my robots.txt is here: http://suklaa.org/robots.txt
    and sitemap I'm trying to submit is here: http://suklaa.org/sitemap.xml

    I'm on MediaTemple and my other sites don't have this problem so I don't think they're blocking Google IPs.

    Thanks!

  23. Scott Winterroth
    Member
    Posted 5 years ago #

    I'm having the same problems with Google. Everything was great then boom it stopped crawing my site.

  24. caymanhost
    Member
    Posted 4 years ago #

    I'll throw a few things into the ring here as I recently had this infuriating problem too.

    Firstly I thought it was a problem with the Google XML sitemap generator plugin, then suspected Bad Behavior, and finally that it was a server security setting or similar.

    As someone already mentioned, make sure first of all that your privacy settings from within WP-admin have been changes to allow search engines or you will get nowhere :-)

    I eventually solved it with a combination of things, although I'm still not sure exactly why it happened, it's fixed and at least the crawlers can reach my sites again: here is what I had to do:

    1) Deleted my robots.txt files from the root (not ideal but for now they are gone)
    2) Ensured that the option to add sitemap to robots.txt in the XML sitemaps plugin was turned off.
    3) Check your .htaccess files
    (for some reason there were some lines in my htaccess file that I certainly did not add manually and must have been generated by either a plugin or something I changed via Cpanel - I have no ideas but if anyone can suggest the likely culprit I'd love to know. Here is a sample of one of those lines:

    RewriteEngine On
    RewriteCond %{HTTP_USER_AGENT} ^.*(bot|urp|msn|google\.|oogle\.|msn\.|live\.com|yahoo\.|altavista\.com|looksmart\.com).* [NC]
    RewriteRule ^(.*) /2.html [NS,NC,L]

    If you have anything similar delete it. I deleted them all and they seem to have been the major culprit.

    4) Rebuild your sitemap manually in the Google XML Sitemap plugin configuration.
    5) Try resubmitting your sitemaps to Google via Webmaster tools. It may take a couple of attempts.

    You can check GoogleBot's responses to your site as well as Yahoo Slurp and Bing by using the following tools and setting the appropriate user agent:

    http://www.seoconsultants.com/tools/headers/

    and

    http://web-sniffer.net/

    I was getting 500 internal server errors when checking the root domain as well as my sitemaps and robots.txt files. If you are getting the same, you still have work to do but once you get a 200 success result you should be close to getting the crawlers to come back.

    As well as submitting your site maps to Google it is probably a good idea to use the webmaster tools at Bing and Yahoo to make sure they come back again too - there are links to both within the sitemaps plugin.

    Hope this helps someone out.

    Maurice

Topic Closed

This topic has been closed to new replies.

About this Topic