WordPress.org

Support

Support » How-To and Troubleshooting » No robots.txt file

No robots.txt file

  • Hi.

    I have installed my blog (on my own server, MAMP, on my Mac).
    Yesterday I added it to Google Webmaster Tools. I woke up today with multiple errors about the robots.txt-file not accessable. I tested, and I could not access the file either. I have tried to disable all of my plugins (including All In One SEO-pack, Google Analytics for WordPress, Google XML sitemaps and Maintenance Mode), but that did not work.

    I’m pretty lost right now, haven’t got any idea of what to do..
    I know WordPress generates a virtual robots.txt-file, but I should be able to access it throught my browser, shouldn’t I?

    Thanks.

Viewing 15 replies - 1 through 15 (of 33 total)
  • yep.

    Get sure you have the option “I would like my blog to be visible to everyone” marked (it’s on “privacy” under configurations).

    And place no robots.txt in your root, it’s created virtually.

    Cheers

    Moderator t-p

    @t-p

    Hi vangrog,

    And place no robots.txt in your root, it’s created virtually.

    If you can please clarify:

    1. Root of blog Directory? e.g., mysite/blog/robots.txt

    2. or root of my domain? e.g., mysite/robots.txt

    Thanks

    In the root, robots.txt placed in directories are ignored.

    I’m not sure how it works with a WP installed in a folder. I have never tried. But as rule it’s got to be on the root of your domain. Maybe someone with more expertise can clarify how this goes for WP installed in a folder.

    P.s.: bur realize I said before for you not to add it, once WP creates it virtually. But considering your blog seems to be in a folder, I dunno how it’d work. You can try to create a robots.txt and add it into your domain root, and see what happens then.

    Cheers

    Moderator t-p

    @t-p

    Thanks vangrog,

    I do have a robots.txt in the root of my domain e.g., /mysite/robots.txt

    i have none in WP blog directory e.g., /mysite/blog/

    Is it its supposed to be? Thanks

    Sounds like to me. You can set rules for your WP using the robots.txt you’ve got on your root. Such as (supposing your WP is in yourdomain.com/blog):

    User-agent: Googlebot-Image
    Disallow: /blog/
    
    User-agent: *
    Disallow: /blog/*.js$
    Disallow: /blog/*.css$
    Disallow: /blog/wp-admin/
    Disallow: /blog/wp-includes/

    =======

    And remember that subdomains are treated pretty much like independent sites for search engines. So, if you have a subdomain for your blog such as blog.mydomain.com, the folder would be the place for robots.txt.

    If you still get errors, try adding a real robots.txt into your blog folder. If a robots.txt exists there, it’ll prevail over the virtual one (in other words: will disable it)

    Cheers

    Moderator t-p

    @t-p

    Hi vangrog, Thanks for your time and patient.

    Please bear with me. I am totally confused about this robots.txt stuff.

    I already have MY robots. txt in the root of my domain since even BEFORE I installed WP.

    This robots.txt has instructions for many files and directories of my website, other than WP.

    What I am gathering is that WP also create a virtual robots.txt??
    Where does it create?
    Does it create in the root of my domain (/mysite/) or does it ceate virtual robots.txt in the WP directory (/mysite/blog/)?

    If the virtual robots.txt is created in the root of my domain, then what happens to MY OWN robots.txt located in the root of my domain?

    Does one cancle the other out? Which one?

    Please help me understand this.

    I thank you very much.

    edit:
    I just checked in my browser:
    1. when I type http://www.mysite/robots.txt; I see MY own robots.txt with all my contents in it

    2. when i type http://www.mysite/blog/robots.txt
    I see this: User-agent: *
    Disallow

    So where I go from here?

    That’s the point. As I said, I’m not sure what WP tries to do when you have it installed inside a folder. As a rule, a virtual robots.txt would be useless in its folder, once search engines ignore those files in directories (unless, as I wrote above, you use a subdomain).

    Maybe WP tries to add it in your root the same way, even if it is installed on a folder. But, as I commented above too, virtual robots.txt will not work if there is already a real robots.txt placed.

    If that was with me, I’d create rules for WP using the robots.txt that already exists inside the domain root. It’s supposed to work like that.

    What’s the error you get? Does it happen when you try to get mydomain.com/robots.txt or mydomain.com/blog/robots.txt ?

    Moderator t-p

    @t-p

    thanks vangrog for your time and patient.

    I just checked in my browser:
    1. when I type http://www.mysite/robots.txt; I see MY own robots.txt with all my contents in it

    2. when i type http://www.mysite/blog/robots.txt
    I see this:

    User-agent: *
    Disallow:

    This seems to be the so called virtual robots.txt
    It does not show anything other than what I have quoted above.

    So where I go from here? Should I add what you suggested in MY OWN robots.txt?
    That is:

    User-agent: Googlebot-Image
    Disallow: /blog/

    User-agent: *
    Disallow: /blog/*.js$
    Disallow: /blog/*.css$
    Disallow: /blog/wp-admin/
    Disallow: /blog/wp-includes/

    Yes, those are the 2 lines WP creates. I prefer to use manually made robots.txt, they load faster, and are easier to edit.

    But if you prefer to keep the virtual one, you can hack the file functions.php, inside your includes folder.

    Find this code (around line 1720)

    function do_robots() {

    And edit it:

    do_action( 'do_robotstxt' );
    
    	if ( '0' == get_option( 'blog_public' ) ) {
    		echo "User-agent: *\n";
    		echo "Disallow: /\n";
    	} else {
    		echo "User-agent: Googlebot-Image\n";
    		echo "Disallow: /\n";
    		echo "\n";
    		echo "\n";
    		echo "User-agent: *\n";
    		echo "Disallow: /*.js$\n";
    		echo "Disallow: /*.css$\n";
    		echo "Disallow: /cgi-local/\n";
    		echo "Disallow: /wp-admin/\n";
    		echo "Disallow: /wp-includes/\n";
                    echo "\n";
    		echo "Sitemap: http://mydomain.com/sitemap.xml.gz\n";
          }
    Moderator t-p

    @t-p

    Thanks vangrog so much.

    In MY OWN robots.txt, I have this:

    User-agent: *
    Disallow: /blog/wp-

    1) Is this ok and sufficient?

    2) or should I replace
    “Disallow: /
    Disallow: /blog/wp-

    with what you reccomended earlier? that is:

    User-agent: Googlebot-Image
    Disallow: /blog/

    User-agent: *
    Disallow: /blog/*.js$
    Disallow: /blog/*.css$
    Disallow: /blog/wp-admin/
    Disallow: /blog/wp-includes/

    Thanks again.

    P.s.: after making whatever you decide to choose, observe if robots (specially google, msn and slurp, which do respect rules) will follow those rules. Because I’m not sure they will, considering it’s placed inside a folder and not in root. If they disregard the rules, set your WP rules on your main robots.txt (the one in your domain root).

    Cheers

    If you want to forbid robots completely, in your root robots.txt add this:

    User-agent: *
    Disallow: /blog/

    And, as it wont hurt if it doesnt work, set WP to disallow it (on “privacy” settings).

    It’ll create this for its folder:

    User-agent: *
    Disallow: /
    Moderator t-p

    @t-p

    If you want to forbid robots completely, in your root robots.txt add this:

    User-agent: *
    Disallow: /blog/

    is it OK to forbid robots completely? Wouldn’t that will stop robots indexing new blog posts, etc.?

    And, as it wont hurt if it doesnt work, set WP to disallow it (on “privacy” settings).

    It’ll create this for its folder:

    User-agent: *
    Disallow: /

    I am sorry I don’t quite follow this.

    Thanks again

Viewing 15 replies - 1 through 15 (of 33 total)
  • The topic ‘No robots.txt file’ is closed to new replies.
Skip to toolbar