dear wordpress users ;-)
i'm running the latest WP 3.1.4 with a multisite install and MU domain mapping with about 10 TLD domains
recently when browsing at google with site:domain.tld to check what google has indexed on my site so far, i see strange looking links which don't come from my site itself but someone must have linked to my sites with special URLs to create duplicate content in google
the permalink structure at my site is setup like this ->
/%category%/%postname%-%post_id%.html
in my opinion good for white hat seo
but the URL's listed in google site:domain.tld report look like this and google lists them as additional 'valid' URLs
for example:
http://domain.org/page/23/?module=postguestbook&func=view&page=10
http://domain.org/page/1/?module=postguestbook&func=view&page=7
http://domain.org/page/2/?module=postguestbook&func=view&page=23
http://domain.org/page/6/?module=postguestbook&func=view&page=77
those URLs display content, like page 23, 1, 2, 6 and so on ...
the content is totally valid. but this results in duplicate content as google thinks that
http://domain.org/page/23/ (valid link)
and
http://domain.org/page/23/?module=postguestbook&func=view&page=10 (duplicate content link)
are not the same! in my opinion someone is playing nasty black hat tricks with my site and sends me over those duplicate links ... bastards ;-)
ok ... i can filter out complete directories with google webmaster tools and set the right robots.txt so those pages like /page/1/ ... /page/23/ .. etc won't get indexed!
is there are plugin available where i can set to 'noarchive' 'noindex' meta tag for this /page/xx/ structure?
i looked at headspace2 plugin which allows me to set the meta tag for category & tag pages. but is there an option / plugin to do this for the /page/x/ structure once someone clicks on 'older post' and browses through the site? any help would be highly appreciated - thanks!
then i see other strange URLs listed in google index, e.g.
http://domain.com/?something=something
http://domain.com/?param=value
http://domain.com/?buy=viagra
and so on ...
none of those links come from my site itself, they must be generated somewhere else and google picks up those links and visits the site and indexes those pages as they are not filtered out by wordpress and do not serve 404 or a redirect
all those strange links above always got to the root URL of the site
and therefore generate duplicate content URLs
which is of course really bad ;-(
can someone point me to a filter duplicate URL and redirect plugin? or point me to a workaround to fix this issue?
if i don't catch those bad URLs someone can really harm my site as it allows spamming indefinitely
var=1
var=23
blackhat=good
blackhat=bad
whatever=andsoon
etc etc etc ...
i think someone i really playing nasty black hat tricks on my site and help is highly appreciated
thanks a million & wish you happy blogging and protecting your sites against evil competitors
greetings & fun
becki