Forums

duplicate content links from external sites probably black hat (1 post)

  1. becki
    Member
    Posted 11 months ago #

    dear wordpress users ;-)

    i'm running the latest WP 3.1.4 with a multisite install and MU domain mapping with about 10 TLD domains

    recently when browsing at google with site:domain.tld to check what google has indexed on my site so far, i see strange looking links which don't come from my site itself but someone must have linked to my sites with special URLs to create duplicate content in google

    the permalink structure at my site is setup like this ->
    /%category%/%postname%-%post_id%.html

    in my opinion good for white hat seo

    but the URL's listed in google site:domain.tld report look like this and google lists them as additional 'valid' URLs

    for example:

    http://domain.org/page/23/?module=postguestbook&func=view&page=10
    http://domain.org/page/1/?module=postguestbook&func=view&page=7
    http://domain.org/page/2/?module=postguestbook&func=view&page=23
    http://domain.org/page/6/?module=postguestbook&func=view&page=77

    those URLs display content, like page 23, 1, 2, 6 and so on ...

    the content is totally valid. but this results in duplicate content as google thinks that

    http://domain.org/page/23/ (valid link)

    and

    http://domain.org/page/23/?module=postguestbook&func=view&page=10 (duplicate content link)

    are not the same! in my opinion someone is playing nasty black hat tricks with my site and sends me over those duplicate links ... bastards ;-)

    ok ... i can filter out complete directories with google webmaster tools and set the right robots.txt so those pages like /page/1/ ... /page/23/ .. etc won't get indexed!

    is there are plugin available where i can set to 'noarchive' 'noindex' meta tag for this /page/xx/ structure?

    i looked at headspace2 plugin which allows me to set the meta tag for category & tag pages. but is there an option / plugin to do this for the /page/x/ structure once someone clicks on 'older post' and browses through the site? any help would be highly appreciated - thanks!

    then i see other strange URLs listed in google index, e.g.

    http://domain.com/?something=something
    http://domain.com/?param=value
    http://domain.com/?buy=viagra

    and so on ...

    none of those links come from my site itself, they must be generated somewhere else and google picks up those links and visits the site and indexes those pages as they are not filtered out by wordpress and do not serve 404 or a redirect

    all those strange links above always got to the root URL of the site

    http://domain.com/

    and therefore generate duplicate content URLs

    which is of course really bad ;-(

    can someone point me to a filter duplicate URL and redirect plugin? or point me to a workaround to fix this issue?

    if i don't catch those bad URLs someone can really harm my site as it allows spamming indefinitely

    var=1
    var=23
    blackhat=good
    blackhat=bad
    whatever=andsoon

    etc etc etc ...

    i think someone i really playing nasty black hat tricks on my site and help is highly appreciated

    thanks a million & wish you happy blogging and protecting your sites against evil competitors

    greetings & fun
    becki

Reply

You must log in to post.

About this Topic