WordPress.org

Ready to get started?Download WordPress

Forums

virtual robots.txt driving me CRAZY! (4 posts)

  1. tony b
    Member
    Posted 4 years ago #

    Hi,

    I have searched the net, and as yet have not found an answer to this problem.

    This may end up being rather technical, so input from experienced PHP/WPRESS users would be very much appreciated.

    I am aware that it is commonly held on forums that GOOGLE should only look at directives from the rood robots.txt, however if this were the case I would not get the issues in webmaaster tools that I am seeing about restricted URL's..

    the problem.

    I have wordpress installed in a folder of http://www.dating-review-uk.co.uk(/blog)

    I have already written my own robots.txt file that is in my root, however noticed that a "virtual" robots is also generated here

    http://www.dating-review-uk.co.uk/blog/robots.txt

    Initially this was set to "disallow" when I browsed for it directly, even though I had correctly set my privacy setting in the WP admin area? To counter this I installed the KB robots plugin and the configured the robots.txt through that and all seemed well.

    However Google webmaster tools reported that several URL's were restricted, so I looked into it and found that the NON WWW. verson of my site was still returning the "disallow" virtual robots.txt for the W-P folder.

    http://dating-review-uk.co.uk/blog/robots.txt

    I have re configured my .htaccess to solve conical issues (redirect ALL non WWW to the WWW. version) however this virtual robots is still served.

    No I have considered adding another ,htaccess to the folder that WP is in to try to specifically redirect that, however I am not convinced that this is the best fix.

    So my question is, how do I stop WP from generating a VIRTUAL ROBOTS.TXT in the first place?

    Surely there is somewhere in the code that I can direct wordpress to serve my ROOT files instead of making one up on the fly?

    Or can I simply STOP the virtual file being created/served and my own version being serves instead?

    ANY help on this would be very much appreciated, as I have only been using wordpress for a few weeks and in sure that other have seen this ussue in the past?

  2. alism
    Member
    Posted 4 years ago #

    Hi Tony,
    Interesting problem. The short answer is that as you point out, a robots.txt in a subdirectory isn't valid, so just use the one you created in the root directory.

    Looking at the one in your root, there is no official "Allow" in robots.txt. Google and alike will probably understand what you're saying, but robots.txt is an *exclusion* protocol. Everything is allowed by default anyway - just use robots.txt to say what you don't want crawled. Keep it simple. If there's anything a good bot doesn't understand or thinks isn't clear, it'll probably just not crawl it to err on the side of caution. But that's another matter for you to think about.

    I've never used the KB Robots.txt plugin before, but the description says: "...if you have WordPress installed in a subdirectory (e.g. http://example.com/blog/), then this plugin won't do much for you, since the search engines won't look for http://example.com/blog/robots.txt, only for http://example.com/robots.txt."
    Are you sure that you've a) set that plugin up correctly, b) that it is actually doing what you think it's doing?

    Why WordPress bothers with a virtual robots.txt in a subdirectory is more of a puzzle. There are some .htaccess tricks that'll force it to be served, but I don't think WordPress uses them. I'd just ignore it, put your rules in the root and be done with it. Perhaps someone will chime in though with an answer to that.

  3. tony b
    Member
    Posted 4 years ago #

    Hi alism,

    Thanks for the reply.

    The current robots in the root, is as a result of testing yesterday -I will correct this now. to reflect that I want to allow everything other than the standard install stuff.

    My worry is that GOOGLE does seem to be referencing the robots in the subdirectory - If it didn't I wouldnt have the "URL restricted by robots.txt" message showing in my google webmaster tools against 141 of my URL's.

    I hope that this is a residual effect from the day I installed W-P, and that the default disallow was referenced (dunno how, in a subfolder??)

  4. alism
    Member
    Posted 4 years ago #

    Take a look at your server log files to see exactly what Googlebot is pulling down. There's gold in them there logs. :-)

Topic Closed

This topic has been closed to new replies.

About this Topic