Support » Fixing WordPress » WordPress treating URL number suffix strangely

  • Hi guys,

    Weird one here that has been bought about by Google Search Console showing me Google is indexing hundreds of pages of my site…

    WordPress seems to handle any number on the end of a valid URL as if it were a real page, and show it as a separate page, complete with canonical URL tag. Example:

    Real Page: http://www2.cuny.edu/academics/academic-programs/
    Made up page: http://www2.cuny.edu/academics/academic-programs/2/

    I have used this page as my example to prove its happening not just with my install (I also tested with default theme & all plugins disabled).

    The result of this craziness is that Google will index these variants, causing duplicate content issues.

    I would have expected WordPress to redirect to the actual page and go from there.

    Thoughts? Help!

    The page I need help with: [log in to see the link]

Viewing 4 replies - 1 through 4 (of 4 total)
  • Moderator bcworkz

    (@bcworkz)

    That pagination format is for where individual pages are paginated, page breaks defined with the <!–nextpage–> tag. Google doesn’t crawl pages unless it found a link to it somewhere. Where is Googlebot getting links with pagination numbers that do not in fact exist? The best recourse is to eliminate such links.

    You can at least cause the canonical link tag to not include post pagination with the “get_canonical_url” filter. At least you won’t be penalized for having duplicate pages this way. If you never use post pagination, you can simply strip the trailing number of any such URLs except terminal numbers preceded with “page/”

    Thread Starter galapogos01

    (@galapogos01)

    I will of course try to find the source of the indexing, however that is irrelevant to the topic of this request which is why is WordPress serving duplicate content for non-existent pages.

    The examples I gave and the ones on my own site do not have any paginated content with a nextpage tag. In this case, get_canonical_url is including the /2 page as is get_post_url (which does not have a filter). This is affecting multiple base pages & hundreds of “fake” pages on my site.

    I could play around with filters, but this seems to be a bug to me. Should this be submitted as a bug rather than a support request?

    Cheers,
    Jason

    Moderator bcworkz

    (@bcworkz)

    If the erroneous links were not generated in the first place, the issue would be moot. However, the purpose of the canonical link tag is to resolve redundant URLs into a single authoritative URL. It’s failing to do that, so in that respect it is a bug in my mind. I have nothing to do with core code decisions though, it doesn’t matter what I think. It’s at least worth discussing on Trac ticket.

    Thread Starter galapogos01

    (@galapogos01)

    Thanks.

    I have no idea how they were generated, but Google has now index 96 pages of nothing on my site so I need to work out how to fix it!

    Ticket raised at https://core.trac.wordpress.org/ticket/43928#ticket

    Cheers,
    Jason

Viewing 4 replies - 1 through 4 (of 4 total)
  • The topic ‘WordPress treating URL number suffix strangely’ is closed to new replies.