mediacity
Member
Posted 8 months ago #
Hey,
I have a problem with google bot. It ignores my robots.txt file and crawls everything. I would like to block tag, author, category, page and some other directories. File is put in my root directory.
User-agent: *
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /trackback
Disallow: /feed
Disallow: /comments
Disallow: /author
Disallow: /tag
Disallow: /page
Disallow: /archives
Disallow: /category/*/*
Disallow: */trackback
Disallow: */feed
Disallow: */comments
Disallow: /*?*
Disallow: /*?
Disallow: /pogoji-uporabe/
Disallow: /pravno-obvestilo/
Allow: /wp-content/uploads
User-agent: Googlebot
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /*.gz$
Disallow: /*.wmv$
Disallow: /*.cgi$
Disallow: /*.xhtml$
Disallow: /*.xlsx $
Disallow: /*.doc$
Disallow: /*.pdf$
Disallow: /*.zip$
Sitemap: http://www.example.com/sitemap.xml
mediacity
Member
Posted 8 months ago #
It gives errors mostly regarding "*" sign. But there are no errors about tag or page directories and google bot still ignores them.
It seems to be correct...
Here is mine :
[code moderated - please use the pastebin for any code of more than 10 lines]
MickeyRoush
Member
Posted 8 months ago #
Make sure your .txt file is saved as either ANSI or UTF-8 without BOM encoding.
mediacity
Member
Posted 8 months ago #
It is saved as UTF-8. For example this line is being blocked successfully:
Disallow: /*.php$
But I would really need to block page, tag, author, category... because they keep showing in search results...
When I test with Webmaster Tools it says:
Allowed
Detected as a directory; specific files may have different restrictions
MickeyRoush
Member
Posted 8 months ago #
It is saved as UTF-8.
Try saving it with ANSI or UTF-8 without BOM (Byte Order Mark).
To do this in Notepad++ go to:
Encoding > Encode in ANSI
or
Encoding > Encode in UTF-8 without BOM
If it's just UTF-8 it could throw Google off.
If you can't change to ANSI or UTF-8 without BOM, trying adding a comment at the beginning of the file. Something like:
# This is the robots.txt file.
Some links that may help:
http://vincentwehren.com/2011/04/09/robots-txt-utf-8-and-the-utf-8-signature/
http://www.google.com/support/forum/p/Webmasters/thread?tid=28d012e70d5fcdc8&hl=en
But I'm not sure if this is your issue, I've just experienced problems with the encoding of the .txt file and what helped me.
mediacity
Member
Posted 8 months ago #
I tried everything suggested but it still doesn't work. Google bot keeps ignoring the robots.txt file. Any other suggestions?
MickeyRoush
Member
Posted 8 months ago #
mediacity
Member
Posted 8 months ago #
Thank you. I will try more luck there. ;)