WordPress.org

Ready to get started?Download WordPress

Forums

[resolved] Google bot - Does this make sense? (5 posts)

  1. Will Taft
    Member
    Posted 7 years ago #

    I opened a Google Webmaster Tools account a while ago more out of curiosity and for education that any real need. I checked it today and noticed that there are several "page not found" errors listed for my site that I would like to understand.

    A couple are for posts where I eliminated a category, so that makes sense to me.

    The others are all for posts that appear to have been indexed by Google while they were in draft form on my site. They are listed like this: mysite/wordpress/2007/04/18/ If I follow that link, the page is not found. The post with that date is now at mysite.com/category/nameofpost/.

    Does my suspicion that the reason for the Google errors is the pages being indexed before I published them make sense? Then Google does not find them the next time because I have published them and they are now at a different location?

    If I do a Google search for the page as it is after publishing, it shows up fine and shows the cache as being retrieved a couple of days ago.

    Thanks for any clues for the clueless on that!

    Additionally, I looked at all the pages Google has actually indexed for my site and it mostly looks fine except for:
    A couple of post are listed twice. For example, one is both at: mydomain.com/category/posttitle/ and again at mydomain.com/wordpress/index.php?P=42. What's up with that?

    Also Google has indexed my WP login screen a few times and also the "lost password" and "register" WP screens??

    Thanks.

  2. whooami
    Member
    Posted 7 years ago #

    Also Google has indexed my WP login screen a few times and also the "lost password" and "register" WP screens??

    I'll help with that ...

    You can use a robots.txt file to prevent "good" bots from spidering those pages:

    User-agent: *
    Disallow: /wp-register.php
    Disallow: /wp-login.php

    Done.

    The next thing you will want to do is use your newly created webmaster account @ google to ask that those pages be removed from their index.

    http://www.google.com/support/webmasters/bin/answer.py?answer=35301&topic=8459

  3. Will Taft
    Member
    Posted 7 years ago #

    Thanks whooami, lots of stuff to read there. One thing I did not see and am unclear on is:

    My WordPress is in it's own directory. So, for example, wp-login.php is at root/wordpress/wp-login.php. However when you go to a page or post on my site, the address is mysite.com/page or mysite.com/category/posttitle/.

    Does the robots.txt file still go in the root directory?

    Thanks!

    Also, can you or anyone else clear up my first questions in the initial post about Google indexing drafts and indexing some posts twice, once with the reference to the index.php file?

  4. whooami
    Member
    Posted 7 years ago #

    these 2 pages,

    wp-register.php
    wp-login.php

    are not affected by what or where your category links, post links, etc.. are.. so why would that be an issue?

    If your blog is at http://www.mydomain.com/blog

    then you simply adjust the path inside the robots.txt:

    User-agent: *
    Disallow: /blog/wp-register.php
    Disallow: /blog/wp-login.php

    Yes, it still goes in the web_root - it always goes in web_root

  5. Will Taft
    Member
    Posted 7 years ago #

    Thanks whooami. It's clear now. Always is an easy word to understand! :)

    I don't have a robots file at all now, so I will make one.

    -----------------
    Still if anyone else can answer my questions in the initial post about Google indexing drafts and indexing some posts twice, once with the reference to the index.php file, I would really appreciate it. I can't see how to stop that if I can't figure out why it is happening. I understand the index.php file in my theme directory and have made several changes to it over time. But I don't understand the index.php files in the root/ and root/wordpress/ directories. It seems to be the one in the /wordpress directory that is getting indexed and showing duplicate posts?

Topic Closed

This topic has been closed to new replies.

About this Topic

Tags

No tags yet.