WordPress.org

Ready to get started?Download WordPress

Ideas

WordPress Needs a Default robots.txt File and More...

  1. Ipstenu (Mika Epstein)
    Half-Elf Support Rogue & Mod

    IIRC, without a trailing slash it only blocks wp-admin, but NOT wp-admin/index.php

    Posted: 2 years ago #
  2. Bryan Hadaway
    Member

    12345

    Actually, that's not correct.

    However, I should note that what I've used does run a risk in an extreme scenario.

    I'll explain better so everyone understands:

    WITH a trailing slash.
    - - - - - - - - - - - - - - - -
    This will ONLY block that directory and it's contents.

    WITHOUT a trailing slash.
    - - - - - - - - - - - - - - - -
    This will block that directory and it's contents and ALL sub-directories and their contents. But, also any URL that starts with that.

    So my solution is actually a lot more simple and thorough at the same time, however a bit open-ended...

    So, one issue could be let's say you write an article:

    "WP Admin: Making it Secure"

    Your URL would come out as:

    http://website.com/wp-admin-making-it-secure/

    Which would in effect get blocked in robots.txt (which we don't want).

    So, I can see why WordPress has it that way by default. Many of the reasons WordPress does things a certain way is to make them foolproof for the general end-user which I'm understanding more and more.

    In any case, there are still some valid points throughout this idea topic that could/should perhaps be cherry picked and created into their own ideas. As far as the virtual robots.txt default, I now see there isn't much more they could do to improve it.

    Thanks, Bryan

    Posted: 2 years ago #
  3. Ihor Vorotnov
    Member

    Bryan is right about the trailing slash. Personally I always install WordPress in separate folder and use a different folder for uploads. Then my robots.txt looks like this:

    Allow: /{wp-install-dir-name}/static
    Disallow: /{wp-install-dir-name}/wp-

    The first line allows to index uploads folder, the second one blocks everything starting with wp- - folders and files, recursively. Assuming you have some unique {wp-install-dir-name} (which is good for security) chances you'll run in a situation described by Bryan (blocking article with url starting with wp-) are close to zero.

    Posted: 2 years ago #
  4. Melbourne Cup Sweep
    Member

    12345

    Would be a good idea in a general sense. More to prevent parts of a site being indexed rather than helping search engines index.

    Your wp-admin and wp-include contents should never be indexed by Google. Not even Google want that in their index. The reason those pages do get indexed is due to server configuration.

    Almost all the time this is due to httpd.conf missing the following statement.

    <IfModule mod_dir.c>
    DirectoryIndex index.htm index.html index.php index.php3 default.html index.cgi
    </IfModule>

    This allows all folders on the server to have an index.php / index.html file in them. After that has been configured, attempting to open a folder on your server will automatically load the index. file, revealing a white screen or 403 error.

    Users with root access to an linux server can usually find the httpd.conf file using this ssh command using putty, $ vi /etc/httpd/conf/httpd.conf.

    Webmasters using shared hosting can ask their web host admin to set it for them.

    Believe me,I learnt the hard way. Had hundereds of thousands of wordpress admin/include/cache pages indexed in under a week. Ruined my rankins. robots.txt is only a temp fix that half works.

    Posted: 2 years ago #
  5. Bryan Hadaway
    Member

    12345

    I suppose at the end of the day it lies in our own hands to take care of ourselves.

    Although once something becomes so big, even if it's "free" like WordPress or Google it enters into a whole new tier of responsibility, public responsibility.

    Right now I'm discovering all sorts of privacy/security holes with BlueHost/HostMonster (what WP recommends and I do too).

    I spent two hours with HostMonster support to finally convince them I'm not a dumb end-user and that there are indeed holes that need to be patched.

    I have a ticket going now, so once that gets resolved I'm going to revamp all my .htaccess and robots.txt files. I'd like to put together a definitive .htaccess file to lock down WordPress. I'll share it in an article at that time. Maybe others can help as well.

    Thanks

    Posted: 2 years ago #
  6. mAsT3RpEE
    Member

    12345

    I've created a page detailing proper setup of robots.txt for WordPress. I will keep it maintained here:

    http://mast3rpee.tk/mast3rpees-tutorials/robots-txt-file/

    Posted: 1 year ago #
  7. iptvnews
    Member

    thanks, this is still useful tip. Recently I ran into issues with new site getting indexed properly. Then I have to use WP Robots Editor plugin to black /tags pages.

    Posted: 1 week ago #

RSS feed for this topic

Reply

You must log in to post.

  • Rating

    12345
    17 Votes
  • Status

    This idea has been implemented