WordPress.org

Ready to get started?Download WordPress

Forums

No robots.txt file (34 posts)

  1. emmern
    Member
    Posted 4 years ago #

    Hi.

    I have installed my blog (on my own server, MAMP, on my Mac).
    Yesterday I added it to Google Webmaster Tools. I woke up today with multiple errors about the robots.txt-file not accessable. I tested, and I could not access the file either. I have tried to disable all of my plugins (including All In One SEO-pack, Google Analytics for WordPress, Google XML sitemaps and Maintenance Mode), but that did not work.

    I'm pretty lost right now, haven't got any idea of what to do..
    I know WordPress generates a virtual robots.txt-file, but I should be able to access it throught my browser, shouldn't I?

    Thanks.

  2. Robert Chapin
    Member
    Posted 4 years ago #

    yep.

  3. vangrog
    Member
    Posted 4 years ago #

    Get sure you have the option "I would like my blog to be visible to everyone" marked (it's on "privacy" under configurations).

    And place no robots.txt in your root, it's created virtually.

    Cheers

  4. Tara
    Member
    Posted 4 years ago #

    Hi vangrog,

    And place no robots.txt in your root, it's created virtually.

    If you can please clarify:

    1. Root of blog Directory? e.g., mysite/blog/robots.txt

    2. or root of my domain? e.g., mysite/robots.txt

    Thanks

  5. vangrog
    Member
    Posted 4 years ago #

    In the root, robots.txt placed in directories are ignored.

    I'm not sure how it works with a WP installed in a folder. I have never tried. But as rule it's got to be on the root of your domain. Maybe someone with more expertise can clarify how this goes for WP installed in a folder.

    P.s.: bur realize I said before for you not to add it, once WP creates it virtually. But considering your blog seems to be in a folder, I dunno how it'd work. You can try to create a robots.txt and add it into your domain root, and see what happens then.

    Cheers

  6. Tara
    Member
    Posted 4 years ago #

    Thanks vangrog,

    I do have a robots.txt in the root of my domain e.g., /mysite/robots.txt

    i have none in WP blog directory e.g., /mysite/blog/

    Is it its supposed to be? Thanks

  7. vangrog
    Member
    Posted 4 years ago #

    Sounds like to me. You can set rules for your WP using the robots.txt you've got on your root. Such as (supposing your WP is in yourdomain.com/blog):

    User-agent: Googlebot-Image
    Disallow: /blog/
    
    User-agent: *
    Disallow: /blog/*.js$
    Disallow: /blog/*.css$
    Disallow: /blog/wp-admin/
    Disallow: /blog/wp-includes/

    =======

    And remember that subdomains are treated pretty much like independent sites for search engines. So, if you have a subdomain for your blog such as blog.mydomain.com, the folder would be the place for robots.txt.

    If you still get errors, try adding a real robots.txt into your blog folder. If a robots.txt exists there, it'll prevail over the virtual one (in other words: will disable it)

    Cheers

  8. Tara
    Member
    Posted 4 years ago #

    Hi vangrog, Thanks for your time and patient.

    Please bear with me. I am totally confused about this robots.txt stuff.

    I already have MY robots. txt in the root of my domain since even BEFORE I installed WP.

    This robots.txt has instructions for many files and directories of my website, other than WP.

    What I am gathering is that WP also create a virtual robots.txt??
    Where does it create?
    Does it create in the root of my domain (/mysite/) or does it ceate virtual robots.txt in the WP directory (/mysite/blog/)?

    If the virtual robots.txt is created in the root of my domain, then what happens to MY OWN robots.txt located in the root of my domain?

    Does one cancle the other out? Which one?

    Please help me understand this.

    I thank you very much.

    edit:
    I just checked in my browser:
    1. when I type http://www.mysite/robots.txt; I see MY own robots.txt with all my contents in it

    2. when i type http://www.mysite/blog/robots.txt
    I see this: User-agent: *
    Disallow

    So where I go from here?

  9. vangrog
    Member
    Posted 4 years ago #

    That's the point. As I said, I'm not sure what WP tries to do when you have it installed inside a folder. As a rule, a virtual robots.txt would be useless in its folder, once search engines ignore those files in directories (unless, as I wrote above, you use a subdomain).

    Maybe WP tries to add it in your root the same way, even if it is installed on a folder. But, as I commented above too, virtual robots.txt will not work if there is already a real robots.txt placed.

    If that was with me, I'd create rules for WP using the robots.txt that already exists inside the domain root. It's supposed to work like that.

    What's the error you get? Does it happen when you try to get mydomain.com/robots.txt or mydomain.com/blog/robots.txt ?

  10. Tara
    Member
    Posted 4 years ago #

    thanks vangrog for your time and patient.

    I just checked in my browser:
    1. when I type http://www.mysite/robots.txt; I see MY own robots.txt with all my contents in it

    2. when i type http://www.mysite/blog/robots.txt
    I see this:

    User-agent: *
    Disallow:

    This seems to be the so called virtual robots.txt
    It does not show anything other than what I have quoted above.

    So where I go from here? Should I add what you suggested in MY OWN robots.txt?
    That is:

    User-agent: Googlebot-Image
    Disallow: /blog/

    User-agent: *
    Disallow: /blog/*.js$
    Disallow: /blog/*.css$
    Disallow: /blog/wp-admin/
    Disallow: /blog/wp-includes/

  11. vangrog
    Member
    Posted 4 years ago #

    Yes, those are the 2 lines WP creates. I prefer to use manually made robots.txt, they load faster, and are easier to edit.

  12. vangrog
    Member
    Posted 4 years ago #

    But if you prefer to keep the virtual one, you can hack the file functions.php, inside your includes folder.

    Find this code (around line 1720)

    function do_robots() {

    And edit it:

    do_action( 'do_robotstxt' );
    
    	if ( '0' == get_option( 'blog_public' ) ) {
    		echo "User-agent: *\n";
    		echo "Disallow: /\n";
    	} else {
    		echo "User-agent: Googlebot-Image\n";
    		echo "Disallow: /\n";
    		echo "\n";
    		echo "\n";
    		echo "User-agent: *\n";
    		echo "Disallow: /*.js$\n";
    		echo "Disallow: /*.css$\n";
    		echo "Disallow: /cgi-local/\n";
    		echo "Disallow: /wp-admin/\n";
    		echo "Disallow: /wp-includes/\n";
                    echo "\n";
    		echo "Sitemap: http://mydomain.com/sitemap.xml.gz\n";
          }
  13. Tara
    Member
    Posted 4 years ago #

    Thanks vangrog so much.

    In MY OWN robots.txt, I have this:

    User-agent: *
    Disallow: /blog/wp-

    1) Is this ok and sufficient?

    2) or should I replace
    "Disallow: /
    Disallow: /blog/wp-

    with what you reccomended earlier? that is:

    User-agent: Googlebot-Image
    Disallow: /blog/

    User-agent: *
    Disallow: /blog/*.js$
    Disallow: /blog/*.css$
    Disallow: /blog/wp-admin/
    Disallow: /blog/wp-includes/

    Thanks again.

  14. vangrog
    Member
    Posted 4 years ago #

    P.s.: after making whatever you decide to choose, observe if robots (specially google, msn and slurp, which do respect rules) will follow those rules. Because I'm not sure they will, considering it's placed inside a folder and not in root. If they disregard the rules, set your WP rules on your main robots.txt (the one in your domain root).

    Cheers

  15. vangrog
    Member
    Posted 4 years ago #

    If you want to forbid robots completely, in your root robots.txt add this:

    User-agent: *
    Disallow: /blog/

    And, as it wont hurt if it doesnt work, set WP to disallow it (on "privacy" settings).

    It'll create this for its folder:

    User-agent: *
    Disallow: /
  16. Tara
    Member
    Posted 4 years ago #

    If you want to forbid robots completely, in your root robots.txt add this:

    User-agent: *
    Disallow: /blog/

    is it OK to forbid robots completely? Wouldn't that will stop robots indexing new blog posts, etc.?

    And, as it wont hurt if it doesnt work, set WP to disallow it (on "privacy" settings).

    It'll create this for its folder:

    User-agent: *
    Disallow: /

    I am sorry I don't quite follow this.

    Thanks again

  17. vangrog
    Member
    Posted 4 years ago #

    I mean: I dont think robots will completely follow the rules set in a file inside a folder, once they'll follow what's in the file in root. But it's better to avoid conflict, if you are gonna have both files. So, whatever rule you choose, use the same in root and in the directory (just, of course, adapt the path; in root, use /blog/ before anyhting; inside the blog folder evidently it's not needed).

    Yes, "Disallow: / " will disallow the whole thing. Leaving it blank ("Disallow: ") will allow the whole thing. That's your choice, you can set which ever rule you prefer, just like you did in your robots.txt placed in your domain root.

  18. Tara
    Member
    Posted 4 years ago #

    thanks vangrog,

    would placing this:

    User-agent: *
    Disallow: /blog/wp-config.php
    Disallow: /blog/wp-admin
    Disallow: /blog/wp-includes
    Disallow: /blog/wp-content

    in My robots.txt (which is in my domain root) will disallow the whole thing as far as blog is concerned?
    Thanks for your continual guidance help

  19. vangrog
    Member
    Posted 4 years ago #

    If you want to understand better how to set rules, read this:

    http://www.google.com/bot.html

    And just remember that when you set rule for a specific bot (user agent), that's the ruleset it'll follow. It means it'll respect the rules you established for it, and will forget about general rules (the ones when you write: "User-agent: * ").

    Cya

  20. vangrog
    Member
    Posted 4 years ago #

    The rule you wrote above reads like this:

    wp-config is in your root, and it's forbidden (anyways, that file should be protected with .htaccess, adding it to robots is meaningless)

    folders also forbidden:
    root/gurblog/wp-admin
    root/gurblog/wp-includes
    root/gurblog/wp-content

    All the rest is allowed.

  21. Tara
    Member
    Posted 4 years ago #

    thanks vangrog,

    great help! with your help, I am beginning to understand a bit.

    Last question, before I let you go, having this in MY robots.txt:

    User-agent: *
    Disallow: /blog/wp-

    1) is it going to disallow the whole blog, including indexing of my blog by the search engines?

    Thanks

  22. vangrog
    Member
    Posted 4 years ago #

    Forget the "wp-" part

    Use this: Disallow: /blog/

    That'll disallow the whole blog, yes: forbids access to any subfolder and any file inside of it and, this way, forbids indexing as well (at least for robots which respect robots.txt, remember that there are many around which are bad bots, for those the only way is to block on you .htaccess).

    Cheers and cya

  23. emmern
    Member
    Posted 4 years ago #

    My problem is that I don't have an virtual file.
    There is no robots.txt file in the root directory of my domain either..

  24. emmern
    Member
    Posted 4 years ago #

    Anyone?

  25. vangrog
    Member
    Posted 4 years ago #

    Virtual file, as the name suggests, is virtual. It doesnt really exist, but WP will create it when it's called.

    A real robots.txt will only be on your host if you do create it yourself, manually, and then upload it. In this case, this file will prevail over the virtual one.

    Cheers

  26. emmern
    Member
    Posted 4 years ago #

    Quote:

    My problem is that I don't have an virtual file.
    There is no robots.txt file in the root directory of my domain either..

  27. vangrog
    Member
    Posted 4 years ago #

    :-p

  28. emmern
    Member
    Posted 4 years ago #

    Still not solved. Isn't there anyone that might have the slightest idea what this comes from?

  29. Tara
    Member
    Posted 4 years ago #

    The way I located my virtual file was to call in my browser like this:
    http://www.mysite.com/blog/robots.txt.

    Then I see this in my browser:
    User-agent: *
    Disallow:

    hope this helps in locating your file.

  30. justmax
    Member
    Posted 4 years ago #

    I just recently noticed one thing with wordpress (using the latest 2.9.2, also tried on lower versions, standard wordpress and wordpress MU.) .

    I'm currently building a site with only pages (no posts). When visiting the /robots.txt URL on my blog I receive a 404 page not found error instead of a virtual robots.txt file. I could not understand why some blogs I've built had a virtual robots.txt and some had not. Because I'm using wordpress mu I rather create a global mu plugin that writes lines to all blogs virtual robots.txt file automatically when called. This enabled me to add functions written to the virtual robots.txt file.

    When I later added a public post to the blog with only pages wordress suddenly decided to activate the virtual robots.txt file. If I delete the posts the virtual robots.txt disappears again. If I put the post/posts in private then the virtual robots.txt file is only available to me as admin.

    This must clearly be a bug in wordpress that still no one has resolved. I find it weird that no one has reported this issue yet.

    Currently the only way around this issue, that I know of, would be to manually add a robots.txt file in your root folder, install a plugin called KB Robots txt or add one post to your blog.

    Hope this helps you m8,
    Cheers

Topic Closed

This topic has been closed to new replies.

About this Topic