WordPress.org

Ready to get started?Download WordPress

Forums

What's your opinion about duplicate content within blog according to Google? (13 posts)

  1. econos
    Member
    Posted 6 years ago #

    Hi,

    When we create blog site using WordPress, whenever we add content, apart from appearing in the main page the same content also appears in the sub-pages like RECENT POSTS, CATEGORIES and ARCHIVES.

    From Google Webmasters I came to know that Google considers this as duplicate content within the domain. I copy below the information from Google Webmasters.

    ==================================

    http://www.google.com/support/webmasters/bin/answer.py?answer=66359

    Duplicate content

    Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar.

    ==================================

    Google Webmasters suggests to use robots.txt (http://www.google.com/support/webmasters/bin/answer.py?answer=35303) to prevent Google from indexing those duplicate pages within the blog domain (like RECENT POSTS, CATEGORIES and ARCHIVES) where duplicate content is present.

    What's your opinion on this?

    Please reply.

    Thanks.

    Econos.

  2. VeraBass
    Member
    Posted 6 years ago #

    To avoid duplicate content, you can exclude prefix extensions such as Category or Archives in the robots.txt, and also further direct Google, Yahoo, etc. to a sitemap.

  3. Will Taft
    Member
    Posted 6 years ago #

    I use the robots.txt method, but have not used a site map. Do you know if sitemaps allow you to exclude areas from indexing? My understanding was that a sitemap provided directions to all parts of your site and would defeat the purpose of a limiting robots.txt file. Is that a misunderstanding?

  4. Samuel Wood (Otto)
    Tech Ninja
    Posted 6 years ago #

    Google has their own sitemap format. There are plugins for WordPress to generate this for Google automagically.

    http://www.arnebrachhold.de/projects/wordpress-plugins/google-xml-sitemaps-generator/

  5. Will Taft
    Member
    Posted 6 years ago #

    But if you provide a sitemap for Google using this plugin, doesn't it defeat the limiting rules in a customized robots.txt file?

  6. Samuel Wood (Otto)
    Tech Ninja
    Posted 6 years ago #

    Why would it? One is telling Google what not to look at, the other is telling Google what you do want it to look at. The Sitemap generated doesn't have to include categories and archives and such, you know...

  7. Will Taft
    Member
    Posted 6 years ago #

    OK, that is the part I misunderstood, not having made a sitemap with the plugin. I thought it just provided a map to your whole site. Thanks!

  8. imagiscapeca
    Member
    Posted 6 years ago #

    So Google looks at both and applies the robots files ahead of the sitemap?

    If Google does not look at both, or does not apply them in that order, then it seems the blogger must repeat the exclusions in their robots file, now putting them in the sitemap (by not including them).

    Right?

  9. Will Taft
    Member
    Posted 6 years ago #

    That's also what I thought Otto to be saying. I'm sure someone will let us know if we are not getting it.

  10. Samuel Wood (Otto)
    Tech Ninja
    Posted 6 years ago #

    The sitemap tells Google what URLs to check on your site. It provides a map to Google telling them what you want them to index.

    The robots.txt prevents Google from checking certain pages on your site.

    Either is effective. I find that the robots.txt is really unnecessary, but if you want to use it, go for it.

  11. imagiscapeca
    Member
    Posted 6 years ago #

    Thanks, Otto, I'll use one or the other - probably the Sitemap.

    I'll check my robots.txt, and then remove it, so I'm not entangled in 2 methods when 1 will do.

  12. VeraBass
    Member
    Posted 6 years ago #

    On the sitemap generator plugin configuration page, you have an option to check which will write the location of your sitemap into your robots.txt file. Google does check the robots text and will then go to the sitemap even if you haven't submitted the sitemap url to them. The 2 files serve different purposes and shouldn't conflict.

    So you can exclude prefixes in the robots.txt and direct search bots to the sitemap, and then if you want to specify no index for WP sub folders, you can do that with an htaccess in the folders themselves.

  13. imagiscapeca
    Member
    Posted 6 years ago #

    Thanks VeraBass,

    That is clear.

    I was operating under a faulty understanding.

Topic Closed

This topic has been closed to new replies.

About this Topic