WordPress.org

Ready to get started?Download WordPress

Forums

Add Actual robots.txt or use Virtual robots.txt? (27 posts)

  1. AA
    Member
    Posted 4 years ago #

    Hi,

    I understand that WP has a virtual robots.txt file. The xml sitemap plugin has an option to add the Sitemap to the virtual robots file. I have chosen that option & the virtual robots.txt file is recognized by Google Webmaster Tools & does in fact include the Sitemap. The sitemap plugin states the following:

    The virtual robots.txt generated by WordPress is used. A real robots.txt file must NOT exist in the blog directory!

    So, what to do? Is the correct answer to not choose the option via the xml sitemap plugin to use the virtual robots.txt file & to create a new actual robots.txt file, where I can also include the Sitemap?

    I look forward to replies. Thanks!

  2. If you never plan to use an actual robots.txt file, feel free to use that option. If you do plan to use a real robots.txt file, disable the option and create/upload your own robots.txt file with the following line (use the location of your sitemap, of course):

    Sitemap: http://www.yourblog.com/sitemap.xml.gz

  3. ClaytonJames
    Member
    Posted 4 years ago #

    I think it means, that because you have chosen to use the virtual robots.txt, you just need to make sure that an actual "robots.txt" file does not exist in your wordpress directory.

    "A real robots.txt file must NOT exist in the blog directory!"

    If it does, the real one will be used, and not the virtual file.

  4. AA
    Member
    Posted 4 years ago #

    Thanks.

    Although I haven't "chosen" to use a virtual robots.txt file. It seems to be a default of WP.

    I want to be 100% certain that it's okay to create a new robots.txt file even though WP creates a virtual one :-)

  5. ClaytonJames
    Member
    Posted 4 years ago #

    Looks like some good reading on this page, too. There is an example of an optimized robots file for WordPress about halfway down the page.

    http://codex.wordpress.org/Search_Engine_Optimization_for_WordPress

  6. Saildude
    Member
    Posted 4 years ago #

    WordPress does not make a robots.txt file unless you tell it to. I.E. set your blog as "Private, allow regular users, block bots" - else WP does not do a robots.txt file unless you use a plug-in of some sort.

    My test site had a virtual keep out robots.txt file (I checked the keep bots out box) but my main site did not have a robots.txt file until I finally got tired of seeing the "file not found"errors in my log and made one myself.

  7. AA
    Member
    Posted 4 years ago #

    Thanks & yes, read that already along with many other threads but still no concrete answers. It would seem from the codex that creating a robots.txt is fine but then what happens to the virtual? Are there now duplicate robots.txt files? I don't want to take chances with the bots, ya know?

    Does anyone have any clear answers to such issues?

  8. AA
    Member
    Posted 4 years ago #

    Anyone have any concrete answers?

  9. Shane Hudson
    Member
    Posted 4 years ago #

    I think you would be best off using a real robots.txt... this way at least you know which one is being used and if the plugin breaks or something goes wrong, there is less problems!

  10. AA
    Member
    Posted 4 years ago #

    Thanks but...

    What happens to the virtual? Are there now duplicate robots.txt files?

  11. WordPress's virtual Robots.txt is only enabled if you select "I would like to block search engines, but allow normal visitors" in Settings/Privacy or if a plugin enables it via a setting like the one in the sitemap plugin.

    Regardless of whether or not it's active, most robots will take an actual robots.txt over WordPress' virtual robots.txt.

  12. AA
    Member
    Posted 4 years ago #

    macmanx,

    Thanks but the site is set to allow all & yet there is still a virtual robots.txt file. We have never, to the best of my recollection, ever had the Privacy set to not allow bots. Hmmm? I suppose it is possible that when I initially set up the WP site, that the Privacy setting was set to not allow, but if so, it has not been that way for a year.

  13. What do you mean "there is still a virtual robots.txt file"? If it's WP's virtual robots.txt file, you won't see it. How do you know that it's still there?

  14. AA
    Member
    Posted 4 years ago #

    I noticed the robots.txt file in Google Webmaster Tools. I, of course, can go to the url of the robots file as well. Therefore, mysite[dot] com/robots.txt. Here is the contents of the file:

    User-agent: *
    Disallow:

    Sitemap: http://www.mysite [dot] com/sitemap.xml.gz

    I use the Xml Sitemap plugin & it has the following option, which I have ticked.

    Add sitemap URL to the virtual robots.txt file.
    The virtual robots.txt generated by WordPress is used. A real robots.txt file must NOT exist in the blog directory!

    This explains the Sitemap reference in the robots.txt file

  15. Strange, that's not a virtual robots.txt file. That's a real robots.txt file. If you're concerned, untick the "Add sitemap URL to the virtual robots.txt file" setting with the sitemap plugin, create your own robots.txt file at mysite[dot] com/robots.txt and fill it with the following text:

    User-agent: *
    
    Sitemap: http://www.mysite [dot] com/sitemap.xml.gz
  16. AA
    Member
    Posted 4 years ago #

    Well, I have no robots.txt file on my server & have read many posts in many threads of others who have noticed the same. Why do you say it is not a virtual? Can't virtual mean that it is created within WP yet actually exists?

    Just a thought. I really want to be certain before I create a new robots.txt. I will untick that plugin option though & see if that removes the file. Stay tuned & thanks!

  17. AA
    Member
    Posted 4 years ago #

    Okay, so...I unticked the option to show the Sitemap in the virtual robots.txt file & the Sitemap is no longer in the robots.txt file, which can be seen via the url.

    The robots.txt file still exists.

  18. ClaytonJames
    Member
    Posted 4 years ago #

    Function Reference/do robots

    http://core.trac.wordpress.org/ticket/11918

    "do_robots(), in wp-includes/functions.php, is currently responsible for handling robots.txt requests for sites that do not have a robots.txt file.

    The default rules are quite lame. All it does is allow or disallow the entire site based on the privacy setting."
    ...that is, of course, paraphrasing the actual content of the trac ticket. The whole thing is worth reading if you're interested.

    Make a real robots.txt file, and put a nice little test message in it for yourself. Place it in your blog root, and navigate to it with your browser. Confirm that you are seeing the file with your test text or not. That might give you a clear course of action (answer) one way or the other.

    Alan, I think you might find seo-browser.com an interesting secondary tool, if you haven't already used it. I just use it for quick references and basic information gathering. It seems to work best in advanced mode.

  19. AA
    Member
    Posted 4 years ago #

    Thanks ClaytonJames...I was just about to create my own but on a site that doesn't get much traffic ;-) I don't like to perform tests on the site that brings home the bacon, ya know.

    I will check out the links as well :-)

  20. ClaytonJames
    Member
    Posted 4 years ago #

    One other thought, you may also want to check and make sure that your privacy settings are set to allow all traffic. I think that setting enables the update services in the admin writing sub-panel.

  21. AA
    Member
    Posted 4 years ago #

    Of course...

    Which is why I am so careful with robots.txt. I do not want this to negatively impact my present rankings. I do have a new site that is still set to block bots & the WP virtual robots.txt indicates this as well.

    Believe me, I have spent days researching this & no one seems to be totally clear on the issue.

  22. Ok, I see what you're saying. I think we were both misreading each other. The virtual robots.txt file does exist when you go to mysite[dot] com/robots.txt, but the file itself does not exist if you check with an FTP client or a file manager because WordPress uses some fancy rewriting to make it appear in the browser, hence the "virtual" part.

    If you create your own robots.txt file at mysite[dot] com/robots.txt, it will override WordPress' virtual robots.txt file.

  23. AA
    Member
    Posted 4 years ago #

    Thanks macmanx for the support!

    I have uploaded a robots.txt to my server & it is overriding the virtual. Now I just need to wait & see what the bots think of it ;-) It should be fine...according to GWT, but any changes pertaining to the crawling of a site get me on my toes.

  24. ClaytonJames
    Member
    Posted 4 years ago #

    I only mentioned it because while visiting a site (which I think is yours, but I'm sure you must have more than one ) the robots.txt file - which I assumed would be the virtual file at that point, rather than a real one - looked like it might be disallowing all. Which would probably mean that your update services were not enabled because of your privacy settings. But if that is just your test site, it probably doesn't matter.

    I now see what I presume is a real robots file.

    User-agent: *
    Disallow: /cgi-bin
    Disallow: /wp-admin
    Disallow: /wp-includes
    Disallow: /wp-content

    Sitemap: http://www.happinesshelp.org/sitemap.xml.gz

    My concern was simply that you didn't place a real file in the directory without making sure you remembered to enable update services in the privacy settings. Looks like you got sorted it out either way.

    Best of luck.

  25. AA
    Member
    Posted 4 years ago #

    Hey Clayton,

    Yea, that's the one I am testing on although it's not a test site, just not the one with a lot of traffic :-)

    Prior to me uploading the actual robots.txt, the virtual looked like this:

    User-agent: *
    Disallow:

    Sitemap: http://www.mysite [dot] com/sitemap.xml.gz

    So, although WP lists "Disallow:" in the virtual, it wasn't followed by "/"...so it was Allowing. Sounds like a riddle, no? ;-)

    Thanks again!

    PS- Would you mind terribly removing the link to my site? I just prefer to not have the thread/url show up in search results. I would appreciate it!

  26. ClaytonJames
    Member
    Posted 4 years ago #

    No riddle. I thought I saw a forward slash when I looked the first time.

    I would remove it for you if I could, but it's too old for me to edit. You placed it in a thread here a while back. That's how I found your site so easily.

    //wordpress.org/support/topic/287962?replies=5

    Maybe if you put a modlook tag on both topics, a moderator will be kind enough to break the links.

  27. head1ess
    Member
    Posted 3 years ago #

    Alan sure you figured this out by now but just wanted to add my own summary as I have read all this thread and think I got it now! So just in case anyone else struggling with it

    Google xml sitemap writes to a virtual robots.txt all it writes is
    Sitemap: http://my site.com /sitemap.xml.gz this virtual robots.txt physically does not exist anywhere

    and it will only write this if you check that box in xml sitemap options

    If your theme has a physical robots.txt file already in the root, which mine did!!!!! google et al will read that and ignore the virtual one.

    So choice is yours uncheck box and add the

    Sitemap: http://my site.com /sitemap.xml.gz

    above to it via FTP and notepad etc.

    or just delete the robots.txt and let sitemap write the sitemap url.

    Think creators of google xml sitemap could of explained this a bit better on site!!!!!

    Phew why is SEO and making your site cool for search engines such a black art?????

Topic Closed

This topic has been closed to new replies.

About this Topic

Tags