WordPress.org

Ready to get started?Download WordPress

Forums

Google bot ignores robots.txt (10 posts)

  1. mediacity
    Member
    Posted 2 years ago #

    Hey,

    I have a problem with google bot. It ignores my robots.txt file and crawls everything. I would like to block tag, author, category, page and some other directories. File is put in my root directory.

    User-agent: *
    Disallow: /cgi-bin
    Disallow: /wp-admin
    Disallow: /wp-includes
    Disallow: /wp-content/plugins
    Disallow: /wp-content/cache
    Disallow: /wp-content/themes
    Disallow: /trackback
    Disallow: /feed
    Disallow: /comments
    Disallow: /author
    Disallow: /tag
    Disallow: /page
    Disallow: /archives
    Disallow: /category/*/*
    Disallow: */trackback
    Disallow: */feed
    Disallow: */comments
    Disallow: /*?*
    Disallow: /*?
    Disallow: /pogoji-uporabe/
    Disallow: /pravno-obvestilo/
    Allow: /wp-content/uploads

    User-agent: Googlebot
    Disallow: /*.php$
    Disallow: /*.js$
    Disallow: /*.inc$
    Disallow: /*.css$
    Disallow: /*.gz$
    Disallow: /*.wmv$
    Disallow: /*.cgi$
    Disallow: /*.xhtml$
    Disallow: /*.xlsx $
    Disallow: /*.doc$
    Disallow: /*.pdf$
    Disallow: /*.zip$

    Sitemap: http://www.example.com/sitemap.xml

  2. Samuel B
    moderator
    Posted 2 years ago #

  3. mediacity
    Member
    Posted 2 years ago #

    It gives errors mostly regarding "*" sign. But there are no errors about tag or page directories and google bot still ignores them.

  4. riversatile
    Member
    Posted 2 years ago #

    It seems to be correct...

    Here is mine :

    [code moderated - please use the pastebin for any code of more than 10 lines]

  5. MickeyRoush
    Member
    Posted 2 years ago #

    Make sure your .txt file is saved as either ANSI or UTF-8 without BOM encoding.

  6. mediacity
    Member
    Posted 2 years ago #

    It is saved as UTF-8. For example this line is being blocked successfully:

    Disallow: /*.php$

    But I would really need to block page, tag, author, category... because they keep showing in search results...

    When I test with Webmaster Tools it says:

    Allowed
    Detected as a directory; specific files may have different restrictions

  7. MickeyRoush
    Member
    Posted 2 years ago #

    It is saved as UTF-8.

    Try saving it with ANSI or UTF-8 without BOM (Byte Order Mark).
    To do this in Notepad++ go to:
    Encoding > Encode in ANSI
    or
    Encoding > Encode in UTF-8 without BOM

    If it's just UTF-8 it could throw Google off.

    If you can't change to ANSI or UTF-8 without BOM, trying adding a comment at the beginning of the file. Something like:

    # This is the robots.txt file.

    Some links that may help:

    http://vincentwehren.com/2011/04/09/robots-txt-utf-8-and-the-utf-8-signature/

    http://www.google.com/support/forum/p/Webmasters/thread?tid=28d012e70d5fcdc8&hl=en

    But I'm not sure if this is your issue, I've just experienced problems with the encoding of the .txt file and what helped me.

  8. mediacity
    Member
    Posted 2 years ago #

    I tried everything suggested but it still doesn't work. Google bot keeps ignoring the robots.txt file. Any other suggestions?

  9. MickeyRoush
    Member
    Posted 2 years ago #

    Any other suggestions?

    You may want to go straight to their support forums:

    http://www.google.com/support/forum/p/Webmasters/label?lid=41234c84d9491af8&hl=en

  10. mediacity
    Member
    Posted 2 years ago #

    Thank you. I will try more luck there. ;)

Topic Closed

This topic has been closed to new replies.

About this Topic

Tags

No tags yet.