Support » Fixing WordPress » Prevent wp serving pages by number?

  • My WP installation is serving pages when requested like mydomain.com/443
    Not even with a ?p=

    It’s driving me crazy because I find indexes pages in Google with the same content than the “pretty” permalinked.

    How do I prevent WP serving pages like that?

Viewing 15 replies - 1 through 15 (of 30 total)
  • More specifically:
    Given /blog setup as the “blog” page (WP installed in root)

    Page -> takes me to
    /blog/695 -> /blog/ (the blog page)
    /695 -> 404 not found

    We just moved WP from /blog to / and although we found this duplicated pages, the problem was already there before.

    There are pages indexed at google as /blog/695 , which cached (indexed) content is the correct content for page ID 695. Oddly it appears correctly as blog/695 for Feb 22 with the old content, and /695 with the new content after the move, but the date still shows Feb 22 for the latter – although it shows the same old date.

    So I understand there’s some mistmatch / delay in google indexing until old-new urls are merged with date, cached content, visits counts (show new and old urls)…

    For now I’ll wait, but preventing WP to serve pages with numbers in the URL is a must.

    Correction:
    It only happens with posts.
    And if the post #695 DOS exists, /695 gets it correctly. The above example was based on a non-existing page, just to show the difference between not-found pages under /blog or right under /. One shows 404 and the other the blog page.

    Moderator Jan Dembowski

    (@jdembowski)

    Brute Squad and Volunteer Moderator

    Can you provide a link to demonstrate what’s going on?

    I think you want mydomain.com/123 to get 301 redirected to the correct mydomain.com/?p=123 which would also send it to the fancy permalink if it was set.

    Well, not necessarily. I don’t care if /123 is not redirected to /?p=123 as far as WATEVER is retrieved has canonical link pointing to /page-name or a proper 404
    I’ll try your suggestion in .htaccess as soon as I get a regex working (I suck at apache’s regexes)

    Anything that avoids a duplicated content issue would be fine.

    The site is biscaynebayfishing.com.

    and I created these rules before WP’s in .htaccess file when moved the blog to the root (and created a /blog page)

    # Redirections for static pages made dynamic
    RewriteEngine On
    # Try try removing /blog/ from url first
    RewriteCond ^/blog/?%{REQUEST_FILENAME}$ -f
    RewriteRule  ^(.+)  /$1  [R=301]
    # If not found, try removing html
    RewriteCond %{REQUEST_FILENAME}\.html !-f
    RewriteRule ^(.*)\.html$ /$1 [R=301,L]
    Moderator Jan Dembowski

    (@jdembowski)

    Brute Squad and Volunteer Moderator

    I hadn’t suggested a .htaccess but I was thinking along those lines. 🙂

    I don’t think you necessarily need to change anything. Here’s why.

    This is a canonical URL

    http://www.biscaynebayfishing.com/miami-biscayne-bay-and-flamingo-fishing-reportsnook-and-bonefish

    That’s post number id 705 (it’s in the HTML).

    These 3 URLs work and sends the browser to the correct location.

    http://www.biscaynebayfishing.com/blog/miami-biscayne-bay-and-flamingo-fishing-reportsnook-and-bonefish
    http://www.biscaynebayfishing.com/?p=705
    http://www.biscaynebayfishing.com/blog/?p=705

    Each of those requests were replied back with a 301 Move Permanently and sent to the canonical URL.

    This requests received a 404 (200 actually, but the HTML page said 404)

    http://www.biscaynebayfishing.com/705

    This sends the browser to the /blog but leaves the URL alone. That’s not good.

    http://www.biscaynebayfishing.com/blog/705

    So try this:

    – Make a backup of the old .htaccess file called .htaccess-SAVE.

    – Delete the old file.

    – Generate a new one by resetting your permalinks. This is to get a fresh slate and should now look like this.

    # BEGIN WordPress
    <IfModule mod_rewrite.c>
    RewriteEngine On
    RewriteBase /
    RewriteRule ^index\.php$ - [L]
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule . /index.php [L]
    </IfModule>
    
    # END WordPress

    Add these lines above the # BEGIN WordPress part.

    # Send old /blog URLs to the new location
    <IfModule mod_rewrite.c>
    RewriteEngine On
    RewriteBase /
    RewriteRule ^blog/(.*) http://www.biscaynebayfishing.com/$1 [R=301,L]
    </IfModule>
    
    # BEGIN WordPress

    And that should take care of the old /blog URLs.

    If anything goes wrong, copy the .htaccess-SAVE to .htaccess and you’ll be back as you were before.

    Edit: Looks like you’ve changed something alright. I’m now getting 500 errors for that site…

    I think I need a little more complex rules, because the /blog/ is not the only thing I need to cover the old pages redirection. I need to remove the .html too.
    Also the old site ALSO had this issue, and still needs be addressed. (The page got indexed and is sending visits like that.

    I tried this rules, no success.

    # Redirections for static pages made dynamic
    RewriteEngine On
    # Try query string page numbers first
    RewriteCond ^/blog/%{REQUEST_FILENAME}$ f
    RewriteRule ^/blog/(\d*)$ /?p=$1 [R=301]
    
    # Try try removing /blog/ from url first
    RewriteCond ^/blog/%{REQUEST_FILENAME}$ -f
    RewriteRule  ^(.+)  /$1  [R=301]
    # If not found, try removing html
    RewriteCond %{REQUEST_FILENAME}\.html !-f
    RewriteRule ^(.*)\.html$ /$1 [R=301,L]

    PS: the default WP code is there, at the end.

    Moderator Jan Dembowski

    (@jdembowski)

    Brute Squad and Volunteer Moderator

    I think I need a little more complex rules, because the /blog/ is not the only thing I need to cover the old pages redirection. I need to remove the .html too.

    Huh. .htaccess rules are fun (I have an odd sense of amusement) so tell you what: Post some examples and I’ll see if I can work out the conditions for the 301 redirection rules in .htaccess.

    Examples like

    If aaaa.html does not exist, re-write it to aaaa/
    If /blog/ send it to / etc.

    Once the rules are sorted out it shouldn’t be that much to add.

    Nice!. Thanks Jan Dembowski

    Here it goes:

    If url contains /blog/ followed (and ending with) by a number
    but not /blog itself
    Redirect to current domain/?p=number
    # (Should we make it the last rule? I don't know how WP manages to deliver a pretty permalinked page)
    
    If url contains /blog/ but not /blog for not-found pages
    remove /blog (find the pages in the root domain
    If still not found remove any .html trailing

    Which is basically what the previous redirects were doing (the query number rule not working yet)

    I still would like to know how to prevent WP deliver pages in any other way not pretty permalinks, just in case robots or idiots index a page like domain.com/123 (It was indexed so obviously was not serving a 404.)

    Thanks.

    PS: I always try to make the ruls NOT containing a hardcoded domain, just in case I have to move it, rehuse it, or simply test it somewhere else other that thisdomain.com

    Moderator Jan Dembowski

    (@jdembowski)

    Brute Squad and Volunteer Moderator

    PS: I always try to make the ruls NOT containing a hardcoded domain, just in case I have to move it, rehuse it, or simply test it somewhere else other that thisdomain.com

    Sensible. Code re-use is our friend. 🙂 Easy to do with %{SERVER_NAME} too.

    Try the code from this pastebin.com link.

    http://pastebin.com/zLL4q0Tu

    Put that above the line that starts with

    # BEGIN WordPress

    And remove everything else. That will perform the rewriting and anything else that doesn’t match will be sent to WordPress for handling.

    Good work jan. The explanations are something I always wanted when learning regexes 🙂
    Still not there, though.

    This code is better than mine in the sense that this one does redirect the number-trailing URLs.

    Although the query string url stays with the number, and doesn’t include the canonical, which I don’t know if that’s a WP “feature” is damaging our blogs with duplicate content?

    You just reminded me that “existing files and folders” don’t include WP pages, because they don’t exist until the WP rules below.

    Should I hack the core for that? :S

    Moderator Jan Dembowski

    (@jdembowski)

    Brute Squad and Volunteer Moderator

    which I don’t know if that’s a WP “feature” is damaging our blogs with duplicate content?

    But it’s not damaging. When you go to a URL that’s not canonical, you get 301 redirected to the “correct” location. The search engines not only know that 301 means “not duplicate content” it also eventually removes the old URL from the searches.

    Thus no duplicate content penalty. When I did a MT to WordPress migration, I 301’ed all the old URLs. After a couple of week the old URLs stopped showing up in searches completely and I removed the redirects.

    You just reminded me that “existing files and folders” don’t include WP pages, because they don’t exist until the WP rules below.

    True. To exclude those URLs you would need to explicitly put that as a condition to ignore those URLs.

    The .htaccess redirects are clever but they’re not that smart. 😉

    But it’s not damaging

    Yes, it is. I was talking about serving a page with /123. Those posts are showing that url in the canonical (not even /?p=123), instead of “/post-name”. That’s duplicate content.

    Which takes me to the subject of this post: If I can’t get WP to generate the proper canonical, how do I PREVENT wordpress serving those pages? Those URL don’t really exist, so if I can’t canonicalize them or redirecting them to /post-name (AND showing /post-name in the address bar) , I’d prefer to deliver a 404.

    I haven’t had time yet to understand exactly how a redirected page shows the requested or the final url in the address bar.

    Moderator Jan Dembowski

    (@jdembowski)

    Brute Squad and Volunteer Moderator

    I disagree but that’s fine. Reasonable people can and do disagree sometimes.

    Give one of these a try.

    http://wordpress.org/extend/plugins/search.php?q=Disable+canonical

    If they do disable the canonical redirects then you should be able to 404 the incorrect URLs. That may directly solve it for you.

    Hehe, excuse my english, I was not opposing, but making sure we are talking about the same thing or figuring out if I missed something.

    The page called with /123 is showing the content for post-ID 123 correctly, but not showing /post-name either in the url or the canonical.

    Isn’t that an issue?

    I think it’s great to “predict” what the visitor tried to see, but there can’t be different canonicals for the same content. All of them should point to /page-name. Shouldn’t them?

    I don’t want to disable the canonicals! I just want either:
    1) canonicals point to the right place
    2) Return /post-name in the url so no canonical is needed.
    3) disable /123 prediction at all if none of above is possible.

    Moderator Jan Dembowski

    (@jdembowski)

    Brute Squad and Volunteer Moderator

    Hehe, excuse my english

    Nope, not a problem. Excuse my lack of Spanish (assuming you speak Spanish) 😀

    The page called with /123 is showing the content for post-ID 123 correctly, but not showing /post-name either in the url or the canonical.

    With the .htaccess rules I proposed it all works out, if you also apply it to outside of /blog too.

    URLs ending /123 will get 301 redirected to /?p=123. But that’s not the canonical URL either so WordPress will 301 redirect that also to the correct URL /some-slug-here.

    So this is how it goes:

    A request for http://site/123 gets 301 redirected to http://site/?p=123 via [R=301,L].

    The browser then requests http://site/?p=123 and is once again 301 redirected to http://site/some-slug-here via [R=301,L].

    The browser then requests http://site/some-slug-here and the web page is delivered with a http status code of 200.

    It all works. The incorrect URLs return 301 and do not show up as duplicate content so no penalty.

Viewing 15 replies - 1 through 15 (of 30 total)
  • The topic ‘Prevent wp serving pages by number?’ is closed to new replies.