WordPress.org

Ready to get started?Download WordPress

Forums

[resolved] robots.txt multiple user-agent lines (16 posts)

  1. ryanve
    Member
    Posted 3 years ago #

    I'm looking at the robots.txt file for a client's website and it's written in a way I haven't seen before:

    User-agent: googlebot
    User-agent: slurp
    User-agent: msnbot
    User-agent: teoma
    Disallow: /

    Does the Disallow: / apply to only teoma or does it apply to all 4 robots?

  2. Is that the whole thing?

    the Disallow applies to everything, which is ... an odd way about it.

  3. ryanve
    Member
    Posted 3 years ago #

    Thanks! Yes, odd is the word LOL. I also posted this same question on Aardvark and Yahoo Answers and I got opinions split. 3 out of 5 people that answered (including you) say it applies to all. No that's not the whole file. See below:

    User-agent: googlebot
    User-agent: slurp
    User-agent: msnbot
    User-agent: teoma
    User-agent: W3C-checklink
    User-agent: WDG_SiteValidator
    Disallow: /
    Disallow: /js/
    Disallow: /Web_References/
    Disallow: /webresource.axd
    Disallow: /scriptresource.axd
    
    User-agent: Mediapartners-Google*
    Disallow:
    
    User-agent: *
    Disallow: /webresource.axd
    Disallow: /scriptresource.axd
    Disallow: /js/
    Disallow: /Web_References/

    It actually looks like they copied it from this article and then added redundancies. The pages on their site do show up in Google but they show up without snippets. The site has been online since the 90s.

  4. Okay, got out my book o' robots.txt

    The declare of user-agents (google, slurp, etc) at the top will obey the disallow below it. In THEORY having Disallow: / blocks everything BUT I know some hosts are hose heads.

    User-agent: Mediapartners-Google*
    Disallow:

    That tells you to always allow Mediapartners-Google.

    The rest says JUST disallow those sections. It looks like they wanted to be doubly sure, but frankly, you don't need it in BOTH places.

  5. ryanve
    Member
    Posted 3 years ago #

    I agree, I'm guessing they made a mistake b/c I'm pretty sure they don't want to blacklist Google or those other engines. Those other files they blocked are 404s. I'm prob. going to recommend that they change it to something like:

    User-agent: Mediapartners-Google*
    Disallow:
    
    User-agent: *
    Disallow: /js/

    or simply

    User-agent: *
    Disallow: /js/

    I'm not sure if there's an advantage to explicitly allowing Mediapartners. It should crawl it anyway as long as its not disallowed. Thanks again!

  6. I would do this:

    User-agent: *
    Disallow: /js/
    Disallow: /Web_References/
    Disallow: /webresource.axd
    Disallow: /scriptresource.axd
    
    User-agent: Mediapartners-Google
    Allow: /
    
    User-agent: Adsbot-Google
    Allow: /
    
    User-agent: Googlebot-Image
    Allow: /
    
    User-agent: Googlebot-Mobile
    Allow: /
    
    User-agent: Browsershots
    Allow: /
    
    User-agent: Dotbot
    Allow: /

    I find I get better results that way. Also if you're running WP, which I presume you are, I would add in this:

    Disallow: /trackback/
    Disallow: /wp-admin/
    Disallow: /wp-content/
    Disallow: /wp-includes/
    Disallow: /xmlrpc.php
    Disallow: /wp-

    They don't need all that :)

  7. ryanve
    Member
    Posted 3 years ago #

    Cool—that's interesting about the better results—thanks! I'd imagine too that Disallow: would give the same results as Allow: / and I guess the point is to give explicit instructions for the robots you want.

    I'm pretty sure that Disallow: /wp- disallows all the wp- folders. Is there a specific reason to disallow /wp-admin/ etc. separately?

  8. Basically, there's no reason a BOT need to come look at wp-admin! :) Drops the pings on your site, which reduces traffic, which makes your site happier.

  9. ryanve
    Member
    Posted 3 years ago #

    Oh yea of course. =) I meant I think Disallow: /wp- disallows /wp-admin/ and /wp-content/ and /wp-includes/ or anything else that starts with /wp-

    I guess it doesn't hurt to list all of them but it's redundant isn't it? Does it make a difference you think?

  10. Ah, the folders specifically tell it 'and nothing IN these locations, either!' It's more for the subfiles than the actual folder names.

  11. cyberbrent
    Member
    Posted 3 years ago #

    Hi Guys,

    Do either of you see if I'm blocking Google Analytics from tracking my site with this robots.txt set up:

    ------
    User-agent: *
    Disallow: /cgi-bin
    Disallow: /wp-admin
    Disallow: /wp-includes
    Disallow: /wp-content/plugins
    Disallow: /wp-content/cache
    Disallow: /wp-content/themes
    Disallow: /feed
    Disallow: /*/feed
    Disallow: /comments
    Disallow: /author
    Disallow: /tag
    Disallow: /archives
    Disallow: /2011/*
    Disallow: /20*
    Disallow: /iframes
    Disallow: /category/*/*
    Disallow: */trackback
    User-agent: Googlebot
    Disallow: /*.php$
    Disallow: /*.js$
    Disallow: /*.inc$
    Disallow: /*.css$
    Disallow: /*.gz$
    Disallow: /*.wmv$
    Disallow: /*.cgi$
    Disallow: /*.xhtml$
    Disallow: /*.xlsx $
    Disallow: /*.doc$
    Disallow: /*.pdf$
    Disallow: /*.zip$
    User-agent: *
    Allow: /images
    Allow: /slides
    Sitemap: http://www.meninkilts.com//sitemap_index.xml

    -------

    Cheers Brent

  12. Generally, after a month, it's best to make a new topic :) This one really was resolved (and I'm gonna flag it ina second).

    Anyway. User-agent: Googlebot looks like you're blocking all .php files, which may be causing your problems.

  13. cyberbrent
    Member
    Posted 3 years ago #

    Huge THANKS! What's your Paypal. Coffee on me :-) Reach me on Twitter: @meninkilts Cheers.

  14. ryanve
    Member
    Posted 3 years ago #

    @cyberbrent Google Analytics relies on the tracking code script. It's totally independent from robots.txt. It looks like you got your robots.txt straightened out. Remember you can always see which URL's are indexed by Google by searching for site:meninkilts.com

  15. cyberbrent
    Member
    Posted 3 years ago #

    Hey Ryanve,

    We'll Ipstenu has been awesome straightening out the Robots.txt. But alas Google Analy is just not being passed anything. Here is what I've done to try to get it to work (and it worked for years before this site update to WP):

    1. I've tried just placing GA code manually at bottom of page (no reading by GA)
    2. Have installed Yoast's WordPress SEO plugin and placed at top in header (still no reading by GA). Yoast's plugin authorized via OAoth and is 100% connect to GA acct.
    3. Have reverified my domain with Google (https://www.google.com/accounts/ManageDomains)
    4. Have verified my in Google Webmaster the domain also with THREE versions:
    A. DNS verified
    B. HTML google doc placed on server and verified.
    C. Verified also with Google Analy via Google Webmaster (so they are linked seeing each other as owning same domain).
    5. Have contacted hosting company for our VPS and there team has looked through and can't see what could be causing issue of GA not picking up our hits on the site.

    * WordPress Stats is working fine as is AWStats on server.

    So why oh why is Google Analy not picking up the counts?

    Paypal for sure if you can sleuth this one out - It has me totally stumped and has been 5 days now of no stats inside GA (just 1 hit per day - from Googlebot is being registered).

    Tweet me @meninkilts - Cheers Brent

  16. At this point we should probably split off into a new topic, cause it's nto robots anymore.

    When I load your page, I can see the GA code in there so it's THERE and that's all Google should need... Is UA-2109736-2 your right GA 'code'?

    Are you using any GA filters on their site?

    The only odd thing I see in your source code is, at the bottom, there's this:

    <script type="text/javascript">
    <!-- include google analytics -->

    and then a huge section of code I don't recognize (nor see on my site, and I too am using Yoast). Maybe you have a function or something else that's calling in the code twice?

    You could start going down the list here: https://www.google.com/support/analyticshelp/bin/answer.py?answer=1009683

Topic Closed

This topic has been closed to new replies.

About this Topic