Support » Everything else WordPress » Huge WordPress BUG + Negative SEO impact

Viewing 15 replies - 1 through 15 (of 33 total)
  • Moderator Jan Dembowski

    (@jdembowski)

    Brute Squad and Volunteer Moderator

    I think I found a huge wordpress bug which actually is not just 4.7.2 specific, but it’s present for a long time (perhaps from the beginning). This bug causes a huge negative seo impact.

    I don’t think that’s the case. 😉

    I want you to take a deep breath and please do not tell me it is a hacking issue. I will prove it is not.

    It’s not a hacking issue.

    The ?p=anything+you+type+here+that+is+not+an+integer will do that because you’re using it incorrectly. That’s not a bug. The ?p= parameter is supposed to get an integer not a string. When you put a string the parameter is ignored and you get the URL less the incorrect ?p=non-integer-number-here.

    Your own examples demonstrate that. Not a bug.

    The problem is, when Google finds a link on the web like the ones I posted above, it begins to crawl the website with ?p=keyword added and when you do site:domain.com on google you notice that its index is filled with tons of duplicate links with

    Yeah. Sorry, THAT’S a hacking issue because nothing in WordPress will ever generate incorrect ?p=strings+here like that. The only reason that would happen is if something on your site did something like that.

    If you’ve deloused your site then Google will eventually sort it out. If you’ve left remnants of that then Google will do what it does.

    Whatever I did, I couldn’t convince people that it has nothing to do with hacking. It is a bug. It is weird that noone noticed it. Perhaps it had been noticed tons of times, but people always say “you’ve got this hacked that hacked”

    OK. What would you like to call it? It’s garbage in, garbage out. You didn’t do that to yourself. WordPress software does not do that. If it’s not hacking then how do you explain it? It’s not a bug because (again) WordPress will not generate those ?p=some+string URLs.

    WordPress will not generate those ?p=some+string URLs.

    put ?p=bla+bla in a category url and look at the pagination…

    all page/1 page/2 links have ?p=bla+bla included

    if wordpress do not generate those urls, what does?

    Moderator Jan Dembowski

    (@jdembowski)

    Brute Squad and Volunteer Moderator

    if wordpress do not generate those urls, what does?

    You won’t like that answer. That’s one side effect when something is installed on your site and it’s blindly appending nonsense to generated URLs without reason or care.

    You won’t like that answer. That’s one side effect when something is installed on your site and it’s blindly appending nonsense to generated URLs without reason or care.

    fresh untouched wordpress installation behaves the same.

    whatever “appending nonsense” affects automattic.com and wordpress.org too.

    actually I don’t understand what you guys are arguing about.

    Actually the biggest problem with this is, if you put ?p=blah+blah after a category and hit enter, the returned page from wordpress includes ?p=blah+blah in pagination links.

    Then google (and other search engines) begin crawling the site and then tons of duplicate links appear in google index.

    This is a great opportunity for those who wants to give negative seo impact on their competitors.

    They just put their competitors links with ?p=blah1 / ?p=blah2 ?p=blah3 and so on… google will end up having 10000s of duplicate links in its index.

    Moderator Jan Dembowski

    (@jdembowski)

    Brute Squad and Volunteer Moderator

    fresh untouched wordpress installation behaves the same.

    whatever “appending nonsense” affects automattic.com and wordpress.org too.

    actually I don’t understand what you guys are arguing about.

    Did you read what I explained above?

    Again, if you put ?p=non-integer+or+string+or+both then WordPress ignores that as nonsense input. It is.

    They just put their competitors links with ?p=blah1 / ?p=blah2 ?p=blah3 and so on… google will end up having 10000s of duplicate links in its index.

    As to what Google does… When you put ?thisdoesnothinginwordpress=online+casinos you get the same thing. I don’t see how that’s a “Huge WordPress BUG + Negative SEO impact”. That’s how query strings work.

    The ?p= is a query strings. Are you concerned that your competition can put links to your site on their site with ?thisdoesnothinginwordpress=online+casinos?

    I’m not arguing anything except that this isn’t a WordPress problem. If you want unknown or incorrect query strings to generate 404s (which is wrong BTW) then I’m sure that can be handled by .htaccess as you’ve suggested or a plugin.

    Yeah. Sorry, THAT’S a hacking issue because nothing in WordPress will ever generate incorrect ?p=strings+here like that. The only reason that would happen is if something on your site did something like that.

    What you mean is wordpress.org and automattic.com is also hacked?

    https://wordpress.org/support/?p=online+casinos doesn’t return 404
    https://automattic.com/news/?p=online+casinos doesn’t return 404

    please install a fresh wordpress and try it yourself… or I consider you have at least one wordpress installation, try it on yourself…

    Moderator Jan Dembowski

    (@jdembowski)

    Brute Squad and Volunteer Moderator

    What you mean is wordpress.org and automattic.com is also hacked?

    Please read what I said. Really, do that.

    What you are describing is how query strings are handled by WordPress. I never said that was a result of being hacked. I said if your site has URLs like what you’ve described on it then your site is hacked.

    @jdembowski

    every wordpress site is affected with this.

    look at this site.

    http://www.wpbeginner.com/category/wp-tutorials/page/2/?p=online%20casinos this is one of the most popular wordpress information sites around. now move your mouse over the “wordpress generated” pagination links (page 1, page2, page3, etc) below, you will see all links have +blah+blah included.

    once google picks up a link with ?p= and follows it, wordpress returns all urls with ?p= it treats them like a seperate page and include them in its index. Then a huge problem arises, your website have duplicate content issue.
    I am blaming wordpress for this issue. It automatically puts ?p=blah to pagination links.

    Moderator Jan Dembowski

    (@jdembowski)

    Brute Squad and Volunteer Moderator

    every wordpress site is affected with this.

    Did you read this part? Serious question.

    https://wordpress.org/support/topic/huge-wordpress-bug-negative-seo-impact/?view=all#post-8850430

    You’re acting like what you are describing is incorrect. It’s not.

    @jandembrowski

    ok, now I am convinced that it is not just a wordpress issue, because the same problem appears on

    drupal sites
    https://www.drupal.org/news?p=online+casinos (look at pagination)

    microsoft sites (considering they are built with .net)
    https://azure.microsoft.com/en-us/blog/?p=online+casinos (look at pagination)

    tomorrow, all links I posted here will be crawled by google and those sites will end up by having tons of online+casino duplicate links in google index.

    Do you think I just discovered a new tool to make negative seo?

    This is year 2017 and after a year with heavy googling I found nothing except a suggestion which tells to disable query strings from htaccess beginning with ?p= (which also leaves wp-admin and most plugins useless)

    Now this is not just my problem, but every website owners problem who want to benefit from search engine results. How are we going to solve this, since it appears to be a problem of huge sites also..

    Moderator Jan Dembowski

    (@jdembowski)

    Brute Squad and Volunteer Moderator

    ok, now I am convinced that it is not just a wordpress issue, because the same problem appears on

    Now we’re talking. 😉

    tomorrow, all links I posted here will be crawled by google and those sites will end up by having tons of online+casino duplicate links in google index.

    No, they really wont. It is 2017 and no search engine will weigh that old trick for anything. Those aren’t links, those are query strings. They don’t really count.

    What would count is if your site had hidden links to places with those keywords.

    Do you think I just discovered a new tool to make negative seo?

    I’m afraid not. That’s not had any results on SEO for years. As in at least 10.

    Why do you think I ended up here?

    My site consists of 180 posts, 9 pages, around 200 tags and 10 categories.

    When I do site:domain.com it appears that I have 18000 pages. (over %90 with bs ?p= links)

    A year ago I asked a similar question by giving my site as an example (idiot me) and then Google treated them as links, followed them and then it has 18000 combinations of my site with page/1/?p=blahblah page/2/?p=blahblah tag/keyword/?p=blahblah etc etc in its index.

    Since then, visitors dropped from 200k/month to 50k/month.

    I am trying to look within the programmer’s perspective (I am not an active programmer, just graduaded 17 years ago) if I type url/?p=bla it shouldn’t include it in rendered result.

    please do not close this topic, since there are tons of people who got rid of pharma hacks etc but still seeing those links in search results

    both site owners and those who want to help think and focus on hacking related issues. Therefore they waste their time for nothing.

    in terms of seo, perhaps adding the line below in robots.txt may help

    Disallow: /*?p=

    Sites like Ahrefs who gives tons of seo advices are also affected https://ahrefs.com/blog/archive/?p=online+casino and no measure has been taken https://ahrefs.com/robots.txt

    Moderator Jan Dembowski

    (@jdembowski)

    Brute Squad and Volunteer Moderator

    please do not close this topic, since there are tons of people who got rid of pharma hacks etc but still seeing those links in search results

    You do realize that search hits are a problem you can raise to Google?

    Edit:

    both site owners and those who want to help think and focus on hacking related issues. Therefore they waste their time for nothing.

    That’s not how it works.

    If your site was properly deloused then Google will sort that out and you’ll stop showing up for those hits.

    If you haven’t then either your site is not deloused properly or there’s a problem with Google’s crawling. You can request that Google re-index your site.

    Putting a 404 will get you a 404 alright but it will not fix Google.

    Edit of the edit: Right, robots.txt file, not .htaccess. OK then. No 404. 😉

    • This reply was modified 1 year, 8 months ago by  Jan Dembowski.
    • This reply was modified 1 year, 8 months ago by  Jan Dembowski.
    Moderator Samuel Wood (Otto)

    (@otto42)

    WordPress.org Admin

    When I do site:domain.com it appears that I have 18000 pages. (over %90 with bs ?p= links)

    In your specific case, this is because you have some code on your site which is removing the protocol parts of links in your HTML source code. Like, instead of “http://example.com”, you’re just posting links with “//example.com”.

    Now, while that is normally fine in a link, it’s not fine in the rel=”canonical” meta.

    The rel-canonical is used to tell Google what is the “canonical link” for a page. So, if I have a category page, and you access it with ?p=whatever, then the canonical link in the head will tell Google that actually, the proper URL for this page does not include that query string. Google then acts accordingly, and doesn’t dupe the page in their index. This also results in benefits from not having seeming “duplicate content”.

    Now, in your case, you have this in the head of your site:

    
    <link rel="canonical" href="//www.example.com/"/>
    

    Shockingly, that’s invalid. A canonical link must have a protocol. You cannot use a protocol-free URL there.

    So, whatever you’re doing to make that happen, well, that’s the real source of your problem. Because Google normally doesn’t do things like indexing from false query strings, because WordPress sites tell Google what the proper URLs of all pages actually are. Normally.

Viewing 15 replies - 1 through 15 (of 33 total)
  • The topic ‘Huge WordPress BUG + Negative SEO impact’ is closed to new replies.