Support » Everything else WordPress » What’s your opinion about duplicate content within blog according to Google?

  • Hi,

    When we create blog site using WordPress, whenever we add content, apart from appearing in the main page the same content also appears in the sub-pages like RECENT POSTS, CATEGORIES and ARCHIVES.

    From Google Webmasters I came to know that Google considers this as duplicate content within the domain. I copy below the information from Google Webmasters.


    Duplicate content

    Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar.


    Google Webmasters suggests to use robots.txt ( to prevent Google from indexing those duplicate pages within the blog domain (like RECENT POSTS, CATEGORIES and ARCHIVES) where duplicate content is present.

    What’s your opinion on this?

    Please reply.



Viewing 12 replies - 1 through 12 (of 12 total)
  • To avoid duplicate content, you can exclude prefix extensions such as Category or Archives in the robots.txt, and also further direct Google, Yahoo, etc. to a sitemap.

    I use the robots.txt method, but have not used a site map. Do you know if sitemaps allow you to exclude areas from indexing? My understanding was that a sitemap provided directions to all parts of your site and would defeat the purpose of a limiting robots.txt file. Is that a misunderstanding?

    Google has their own sitemap format. There are plugins for WordPress to generate this for Google automagically.

    Google (XML) Sitemaps Generator for WordPress

    But if you provide a sitemap for Google using this plugin, doesn’t it defeat the limiting rules in a customized robots.txt file?

    Why would it? One is telling Google what not to look at, the other is telling Google what you do want it to look at. The Sitemap generated doesn’t have to include categories and archives and such, you know…

    OK, that is the part I misunderstood, not having made a sitemap with the plugin. I thought it just provided a map to your whole site. Thanks!

    So Google looks at both and applies the robots files ahead of the sitemap?

    If Google does not look at both, or does not apply them in that order, then it seems the blogger must repeat the exclusions in their robots file, now putting them in the sitemap (by not including them).


    That’s also what I thought Otto to be saying. I’m sure someone will let us know if we are not getting it.

    The sitemap tells Google what URLs to check on your site. It provides a map to Google telling them what you want them to index.

    The robots.txt prevents Google from checking certain pages on your site.

    Either is effective. I find that the robots.txt is really unnecessary, but if you want to use it, go for it.

    Thanks, Otto, I’ll use one or the other – probably the Sitemap.

    I’ll check my robots.txt, and then remove it, so I’m not entangled in 2 methods when 1 will do.

    On the sitemap generator plugin configuration page, you have an option to check which will write the location of your sitemap into your robots.txt file. Google does check the robots text and will then go to the sitemap even if you haven’t submitted the sitemap url to them. The 2 files serve different purposes and shouldn’t conflict.

    So you can exclude prefixes in the robots.txt and direct search bots to the sitemap, and then if you want to specify no index for WP sub folders, you can do that with an htaccess in the folders themselves.

    Thanks VeraBass,

    That is clear.

    I was operating under a faulty understanding.

Viewing 12 replies - 1 through 12 (of 12 total)
  • The topic ‘What’s your opinion about duplicate content within blog according to Google?’ is closed to new replies.