WordPress.org

Ready to get started?Download WordPress

Ideas

WordPress Needs a Default robots.txt File and More...

  1. Bryan Hadaway
    Member

    12345

    It takes a surprising amount of work to make WordPress secure and search-engine-friendly. And that's only if you're aware that you might need or want to do something like that, which the average WordPress user is not.

    Security

    The most obvious of issues here is the fact that most people do not install WordPress very securely because they don't know how or that it's even an option. Simply using admin as the main username, not adding salts and leaving the database table prefix set as wp_ are common and no good.

    These shouldn't be optional. At the install screen, it should be mandatory that both a username and wp_ be set to something unique. Salts should automatically be generated and injected upon install.

    SEO

    Most people have no idea that hundreds of URLs from their wp-admin and wp-includes folders end up indexed in Google and other search engines.

    This should never happen. WordPress definitely needs to drop a robots.txt file that blocks access to all behind-the-scenes files and perhaps a default .htaccess file to actually enforce it.

    All folders should at the very least have indexes to block directory browsing or the equivalent option set in .htaccess.

    Since there's already an option of whether to share your site with search engines or not in the first place, it only seems reasonable to have it done properly should you choose to share your site.

    Bottom Line

    I know there's documentation on this:
    http://codex.wordpress.org/Hardening_WordPress
    http://codex.wordpress.org/Search_Engine_Optimization_for_WordPress

    But, the problem is that nine out of ten people that use WordPress don't know any better to even look into such efforts in the first place. They have no idea. The basics should already be covered for them.

    This would not take any work at all, it's just a matter of whether you choose to include a default robots.txt file similar to the following or not:

    User-agent: *
    Disallow: /wp-admin
    Disallow: /wp-includes
    Disallow: /wp-content/plugins
    Disallow: /wp-content/cache
    Disallow: /wp-content/themes
    Disallow: /wp-includes/js
    Disallow: /trackback
    Disallow: /category/*/*
    Disallow: */trackback
    Disallow: /*?*
    Disallow: /*?
    Disallow: /*~*
    Disallow: /*~

    Thanks, Bryan

    Posted: 2 years ago #
  2. Ipstenu (Mika Epstein)
    Administrator

    WP has a virtual robot's file, actually. By default, WP's robots.txt has:

    User-agent: *
    Disallow: /wp-admin/
    Disallow: /wp-includes/

    So it's not a matter of WP not hiding it, it's a matter of robots not honoring.

    Posted: 2 years ago #
  3. Bryan Hadaway
    Member

    12345

    Clearly, that's an issue in of and itself then. Simple fix.

    Because when I drop a real robots.txt file it (Google, the only search engine I care about), respects it.

    So, it still needs to be an actual file. Also, it should be updated to:

    User-agent: *
    Disallow: /wp-admin
    Disallow: /wp-includes

    And you bring up a good point, some bots will never respect robots.txt regardless of if it's done properly or not, so that also means to be doubly sure, .htaccess should actually enforce bots to not access those areas.

    Also, I always wonder why Google (the wise old bird that it is), who knows and loves WordPress all too well and already makes accommodations for it, wouldn't simply make a universal rule to not index the hundreds of useless background files.

    I'm sure they're aware, but it's probably policy not make judgment calls like that and leave the thing wide open. So it's up to the user, which is a failed concept as we know.

    So, the simple solution is that WordPress could drop both a default .htaccess and robots.txt file. And the actual files too, not just virtual because that's clearly being ignored as you pointed out and search engines can obviously tell the difference between a soft file and a hard one.

    Thanks, Bryan

    Posted: 2 years ago #
  4. Ipstenu (Mika Epstein)
    Administrator

    There can never be a default .htaccess or robots.txt in the install for the same reason there isn't a wp-config.php, and we have WP actually create or write to those files: You will kill anyone who has to do a manual upgrade.

    Full stop.

    However. The virtual robots.txt that already exists is honored by Google.

    Posted: 2 years ago #
  5. Bryan Hadaway
    Member

    12345

    That's a good point. Although, hasn't WordPress updated the way it's update API works to only update the files that need updating?

    Also, I thought we already agreed that the virtual robots.txt doesn't work ideally? Not just regarding bad bots that ignore robots.txt that is.

    Since what version has a virtual robots.txt been implemented, do you know? Perhaps some of my sites are old enough that wp-admin and wp-includes URLs got indexed before that update?

    Still, there definitely needs to be improvement here come to fruition in some capacity.

    Thanks, Bryan

    Posted: 2 years ago #
  6. Ipstenu (Mika Epstein)
    Administrator

    Yes, but remember what we tell people if the auto-upgrade fails. "Upload the files manually." :)

    And the virtual robots.txt works 100% as well as the physical one. It's been there since ... version 2.9 I know for sure. I THINK earlier.

    There are a lot of plugins that let you customize the robots.txt file (like Yoast's SEO plugin).

    Posted: 2 years ago #
  7. Bryan Hadaway
    Member

    12345

    Okay, I believe I first installed my blog on May 21, 2009. The same blog that had hundreds of wp-admin and wp-includes URLs indexed. So, I must have just missed the update.

    It's good to know that a virtual robots.txt has since been in place taking care of this issue and that it does indeed work after all.

    At this point, should I make a new idea topic for folder security so the remaining issues are less buried?

    That is actually blocking access to all back-end folders with indexes and better secure installs.

    Thanks, Bryan

    Posted: 2 years ago #
  8. Ipstenu (Mika Epstein)
    Administrator

    That suggestion has been made many times, and the problem remains: How can you do it so it works with all servers.

    I can do it on mine, but my dad's is different and has to have different settings. They're both Linux. Toss in a Windows, or an nginx install, and suddenly it's all crazy pants :/

    Posted: 2 years ago #
  9. Bryan Hadaway
    Member

    12345

    Well, WordPress loves it's conditionals :). WordPress already recommends BlueHost, check. Then they could have a conditional for install type based on server type which I'm sure they already have to do in a number of ways anyways like permalinks for Windows.

    I hate Windows servers for website's by the way. Not the best environment for web design at all.

    What about the more secure MySql install? Has that been suggested? Since WordPress requires a certain version of MySql anyways, the lines couldn't get crossed by enforcing more secure installs? That's pretty much dummy proof.

    Thanks, Bryan

    Posted: 2 years ago #
  10. RonStrilaeff
    Member

    Hi guys, Something still needs clarification here

    The wordpress virtual robot.txt file looks like this:

    User-agent: *
    Disallow: /wp-admin/
    Disallow: /wp-includes/

    However, Brian suggested that it should like this

    User-agent: *
    Disallow: /wp-admin
    Disallow: /wp-includes

    But aren't those almost the opposite of each other? I may have it wrong (and please set me straight) but I thought the last "/" char in the WP version says to "disallow all files below this folder" but the absence of the "/" char in the second example means to disallow none of the files in that folder or any below it.

    So we want the default WP version which accomplishes the goal of preventing well-behaved robots from indexing all the redundant, static WP code files. The second example is flat-out wrong... right? :-)

    Posted: 2 years ago #

RSS feed for this topic

Reply »

You must log in to post.

  • Rating

    12345
    17 Votes
  • Status

    This idea has been implemented