WordPress.org

Ready to get started?Download WordPress

Forums

Asking for opinions on this robots.txt and header.php (6 posts)

  1. namelessinmo
    Member
    Posted 6 years ago #

    If I use the following in my header.php

    <?php if ( $paged > 1 ) {

    echo '<meta name=”robots” content=”noindex,follow” /> ';

    }?>

    <?php if (is_author() ) {

    echo '<meta name=”robots” content=”noindex,follow” /> ';

    }?>

    <?php if (is_trackback() ) {

    echo '<meta name=”robots” content=”noindex,follow” /> ';

    }?>

    <?php if (is_search() ) {

    echo '<meta name=”robots” content=”noindex,follow” /> ';

    }?>

    <?php if (is_date() ) {

    echo '<meta name=”robots” content=”noindex,follow” /> ';

    }?>

    And if I use the following in the robots.txtUser-agent: *
    Disallow: /cgi-bin
    Disallow: /wp-admin
    Disallow: /wp-includes
    Disallow: /wp-content/plugins
    Disallow: /wp-content/cache
    Disallow: /wp-content/themes
    Disallow: /wp-content/uploads
    Disallow: /tag
    Disallow: /author
    Disallow: /trackback
    Disallow: /*trackback
    Disallow: /*trackback*
    Disallow: /*/trackback
    Disallow: /*?*
    Disallow: /*.html/$
    Disallow: /*feed*
    Disallow: /*amp;*
    Disallow: /comments
    Disallow: */comments
    Disallow: /*?*
    Disallow: /*?
    Disallow: /date/
    Disallow: /archive/
    Disallow: /rss/
    Disallow: /about/trackback/
    Disallow: /wp-register.php
    Disallow: /wp-login.php
    isallow: /2006/
    Disallow: /2007/
    Disallow: /2008/
    Disallow: /iframes/
    Disallow: /recommends/

    # Google Image
    User-agent: Googlebot-Image
    Disallow:
    Allow: /*

    # Google AdSense
    User-agent: Mediapartners-Google*
    Disallow:
    Allow: /*

    # Internet Archiver Wayback Machine
    User-agent: ia_archiver
    Disallow: /

    # digg mirror
    User-agent: duggmirror
    Disallow: /

    User-Agent: MediaPartners-Google
    Allow: /

    Sitemap: http://www.*.com/sitemap.xml

    How will this turn out? I could use suggestions because I very new to this.

    I am sure this has been addressed here, but I just wanted to be sure my version is going to be ok.

    I used a combination of examples that I found online. I see some people do almost the opposite. My site is going to have posts that are articles arranged by categories. I have permalinks set to show the category name and then the post name. I set it like this
    /%category%/%postname%.html with a category base of /. I am doing this to duplicate a site I am moving from articlems in which the posts end with .html
    I didn't include categories as a disallow, because I was afraid this might disallow my posts being followed also. Does anyone have an opinion on this?

    I have a few php links that I will be updating in the future at the bottom of the index page, will the search engines follow these with the files above the way they are?

    I also have some phps that I have put into the wordpress as posts and 301 redirected with the .htaccess. (to html) Will the search engines follow these ok?

    If you could suggest what to add or what to remove and if possible why, it would be very much appreciated.

    I am working on a local server right now, so I don't have a link.

    Thank You.

  2. Adam Brown
    Member
    Posted 6 years ago #

    A few general comments to start.

    First, there is no such thing as a * in a robots.txt. A couple groups have tried to get it put into the standard, but not all search engines recognize what * means. To avoid unexpected results, you may want to remove lines with * in them, but it's up to you.

    Second, there is only a "disallow", not an "allow." Delete all the "allow" lines in your robots.txt. That's true for every search engine.

    Third, you don't need to specify "follow" in your meta tag. "follow" is assumed unless you override it with "nofollow."

    Now some specifics...

    I have a few php links that I will be updating in the future at the bottom of the index page, will the search engines follow these with the files above the way they are?

    That depends on what the links point to. If they are external links, they aren't covered by robots.txt. Either way, it looks like your index.php isn't being covered by the <meta> tag.

    Another thing. I don't understand why you use this:

    Disallow: /wp-content/plugins
    Disallow: /wp-content/cache
    Disallow: /wp-content/themes
    Disallow: /wp-content/uploads

    Instead of

    Disallow: /wp-content/

    A concluding comment:

    Sign up for a google webmasters account. They have a cool little tool in there where you can see exactly what URLs on your site will be affected by changes to your robots.txt. (Of course, this requires that google has found your site.)

  3. namelessinmo
    Member
    Posted 6 years ago #

    adamrbrown,

    Thank you for answering. First the * was just generic to represent that I should add in the sitemap. I think I am supposed to?

    I saw the allow at more than one website that was discussing the robots.txt, so I thought it was a normal thing to do. I appreciate you pointing it out.

    -------------You said:----------------
    Now some specifics...

    I have a few php links that I will be updating in the future at the bottom of the index page, will the search engines follow these with the files above the way they are?

    That depends on what the links point to. If they are external links, they aren't covered by robots.txt. Either way, it looks like your index.php isn't being covered by the <meta> tag.
    --------

    The php pages are from an older site (before the articlems) and they call a .shtml file. They are on the same domain.

    I just don't want the search engine to avoid following the redirects or the phps that are still on the site for any reason. I was asking if it should be something the search engine will still follow. I guess you are saying I didn't exclude them so the search engines will follow. Is this correct?

    I wondered why not use Disallow: /wp-content/ instead of all the different versions, but I was just duplicating what I saw others do.
    It makes more sense that way.

    I need to look at the robots test at google, it is silly, but I am afraid they would index all of the duplicate content right away if I had it wrong. It probably does not work that way, I just want to get it right the first time.

    I will try to clean things up and repost again. Do the header tags I listed seem ok?

    Again thank you for taking the time to help me.

  4. namelessinmo
    Member
    Posted 6 years ago #

    I guess you addressed the header the first time when you said:

    Third, you don't need to specify "follow" in your meta tag. "follow" is assumed unless you override it with "nofollow."

    Sorry I missed that, should I change anything else in the header?

  5. namelessinmo
    Member
    Posted 6 years ago #

    First, there is no such thing as a * in a robots.txt. A couple groups have tried to get it put into the standard, but not all search engines recognize what * means. To avoid unexpected results, you may want to remove lines with * in them, but it's up to you.

    I see what your saying, I was not paying enough attention apparently. I would have to get rid if things like
    Disallow: /*.html/$
    Disallow: /*feed*
    Disallow: /*amp;*

    Does anyone know how much of the robots.txt sample is necessary? I don't want it to be too complex. I really just want to avoid duplicate content and supplemental results.

  6. ravetildon
    Member
    Posted 6 years ago #

    first off the * (wildcard) is allowed in some search engines as is the allow... Googlebot does recognize those commands & that's the main search engine you want to fix WordPress duplicate content for via robots.

    I don't have time to go thru each of those items you have but Google webmasters tools has a great tool. Definitely check it out & you can make sure you don't disallow all the pages in your siteby a bad robots implementation.

    Check out this page for sample robots.txt file starting point:
    http://codex.wordpress.org/Search_Engine_Optimization_for_Wordpress#Robots.txt_Optimization

    Looks like a few of those items in your file are dups tho.

    Good luck!

Topic Closed

This topic has been closed to new replies.

About this Topic