WordPress.org

Ready to get started?Download WordPress

Forums

[closed] Prevent wp serving pages by number? (31 posts)

  1. SocialBlogsite
    Member
    Posted 2 years ago #

    My WP installation is serving pages when requested like mydomain.com/443
    Not even with a ?p=

    It's driving me crazy because I find indexes pages in Google with the same content than the "pretty" permalinked.

    How do I prevent WP serving pages like that?

  2. SocialBlogsite
    Member
    Posted 2 years ago #

    More specifically:
    Given /blog setup as the "blog" page (WP installed in root)

    Page -> takes me to
    /blog/695 -> /blog/ (the blog page)
    /695 -> 404 not found

    We just moved WP from /blog to / and although we found this duplicated pages, the problem was already there before.

    There are pages indexed at google as /blog/695 , which cached (indexed) content is the correct content for page ID 695. Oddly it appears correctly as blog/695 for Feb 22 with the old content, and /695 with the new content after the move, but the date still shows Feb 22 for the latter - although it shows the same old date.

    So I understand there's some mistmatch / delay in google indexing until old-new urls are merged with date, cached content, visits counts (show new and old urls)…

    For now I'll wait, but preventing WP to serve pages with numbers in the URL is a must.

  3. SocialBlogsite
    Member
    Posted 2 years ago #

    Correction:
    It only happens with posts.
    And if the post #695 DOS exists, /695 gets it correctly. The above example was based on a non-existing page, just to show the difference between not-found pages under /blog or right under /. One shows 404 and the other the blog page.

  4. Can you provide a link to demonstrate what's going on?

    I think you want mydomain.com/123 to get 301 redirected to the correct mydomain.com/?p=123 which would also send it to the fancy permalink if it was set.

  5. SocialBlogsite
    Member
    Posted 2 years ago #

    Well, not necessarily. I don't care if /123 is not redirected to /?p=123 as far as WATEVER is retrieved has canonical link pointing to /page-name or a proper 404
    I'll try your suggestion in .htaccess as soon as I get a regex working (I suck at apache's regexes)

    Anything that avoids a duplicated content issue would be fine.

    The site is biscaynebayfishing.com.

    and I created these rules before WP's in .htaccess file when moved the blog to the root (and created a /blog page)

    # Redirections for static pages made dynamic
    RewriteEngine On
    # Try try removing /blog/ from url first
    RewriteCond ^/blog/?%{REQUEST_FILENAME}$ -f
    RewriteRule  ^(.+)  /$1  [R=301]
    # If not found, try removing html
    RewriteCond %{REQUEST_FILENAME}\.html !-f
    RewriteRule ^(.*)\.html$ /$1 [R=301,L]
  6. I hadn't suggested a .htaccess but I was thinking along those lines. :)

    I don't think you necessarily need to change anything. Here's why.

    This is a canonical URL

    http://www.biscaynebayfishing.com/miami-biscayne-bay-and-flamingo-fishing-reportsnook-and-bonefish

    That's post number id 705 (it's in the HTML).

    These 3 URLs work and sends the browser to the correct location.

    http://www.biscaynebayfishing.com/blog/miami-biscayne-bay-and-flamingo-fishing-reportsnook-and-bonefish
    http://www.biscaynebayfishing.com/?p=705
    http://www.biscaynebayfishing.com/blog/?p=705

    Each of those requests were replied back with a 301 Move Permanently and sent to the canonical URL.

    This requests received a 404 (200 actually, but the HTML page said 404)

    http://www.biscaynebayfishing.com/705

    This sends the browser to the /blog but leaves the URL alone. That's not good.

    http://www.biscaynebayfishing.com/blog/705

    So try this:

    - Make a backup of the old .htaccess file called .htaccess-SAVE.

    - Delete the old file.

    - Generate a new one by resetting your permalinks. This is to get a fresh slate and should now look like this.

    # BEGIN WordPress
    <IfModule mod_rewrite.c>
    RewriteEngine On
    RewriteBase /
    RewriteRule ^index\.php$ - [L]
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule . /index.php [L]
    </IfModule>
    
    # END WordPress

    Add these lines above the # BEGIN WordPress part.

    # Send old /blog URLs to the new location
    <IfModule mod_rewrite.c>
    RewriteEngine On
    RewriteBase /
    RewriteRule ^blog/(.*) http://www.biscaynebayfishing.com/$1 [R=301,L]
    </IfModule>
    
    # BEGIN WordPress

    And that should take care of the old /blog URLs.

    If anything goes wrong, copy the .htaccess-SAVE to .htaccess and you'll be back as you were before.

    Edit: Looks like you've changed something alright. I'm now getting 500 errors for that site...

  7. SocialBlogsite
    Member
    Posted 2 years ago #

    I think I need a little more complex rules, because the /blog/ is not the only thing I need to cover the old pages redirection. I need to remove the .html too.
    Also the old site ALSO had this issue, and still needs be addressed. (The page got indexed and is sending visits like that.

    I tried this rules, no success.

    # Redirections for static pages made dynamic
    RewriteEngine On
    # Try query string page numbers first
    RewriteCond ^/blog/%{REQUEST_FILENAME}$ f
    RewriteRule ^/blog/(\d*)$ /?p=$1 [R=301]
    
    # Try try removing /blog/ from url first
    RewriteCond ^/blog/%{REQUEST_FILENAME}$ -f
    RewriteRule  ^(.+)  /$1  [R=301]
    # If not found, try removing html
    RewriteCond %{REQUEST_FILENAME}\.html !-f
    RewriteRule ^(.*)\.html$ /$1 [R=301,L]

    PS: the default WP code is there, at the end.

  8. I think I need a little more complex rules, because the /blog/ is not the only thing I need to cover the old pages redirection. I need to remove the .html too.

    Huh. .htaccess rules are fun (I have an odd sense of amusement) so tell you what: Post some examples and I'll see if I can work out the conditions for the 301 redirection rules in .htaccess.

    Examples like

    If aaaa.html does not exist, re-write it to aaaa/
    If /blog/ send it to / etc.

    Once the rules are sorted out it shouldn't be that much to add.

  9. SocialBlogsite
    Member
    Posted 2 years ago #

    Nice!. Thanks Jan Dembowski

    Here it goes:

    If url contains /blog/ followed (and ending with) by a number
    but not /blog itself
    Redirect to current domain/?p=number
    # (Should we make it the last rule? I don't know how WP manages to deliver a pretty permalinked page)
    
    If url contains /blog/ but not /blog for not-found pages
    remove /blog (find the pages in the root domain
    If still not found remove any .html trailing

    Which is basically what the previous redirects were doing (the query number rule not working yet)

    I still would like to know how to prevent WP deliver pages in any other way not pretty permalinks, just in case robots or idiots index a page like domain.com/123 (It was indexed so obviously was not serving a 404.)

    Thanks.

    PS: I always try to make the ruls NOT containing a hardcoded domain, just in case I have to move it, rehuse it, or simply test it somewhere else other that thisdomain.com

  10. PS: I always try to make the ruls NOT containing a hardcoded domain, just in case I have to move it, rehuse it, or simply test it somewhere else other that thisdomain.com

    Sensible. Code re-use is our friend. :) Easy to do with %{SERVER_NAME} too.

    Try the code from this pastebin.com link.

    http://pastebin.com/zLL4q0Tu

    Put that above the line that starts with

    # BEGIN WordPress

    And remove everything else. That will perform the rewriting and anything else that doesn't match will be sent to WordPress for handling.

  11. SocialBlogsite
    Member
    Posted 2 years ago #

    Good work jan. The explanations are something I always wanted when learning regexes :)
    Still not there, though.

    This code is better than mine in the sense that this one does redirect the number-trailing URLs.

    Although the query string url stays with the number, and doesn't include the canonical, which I don't know if that's a WP "feature" is damaging our blogs with duplicate content?

    You just reminded me that "existing files and folders" don't include WP pages, because they don't exist until the WP rules below.

    Should I hack the core for that? :S

  12. which I don't know if that's a WP "feature" is damaging our blogs with duplicate content?

    But it's not damaging. When you go to a URL that's not canonical, you get 301 redirected to the "correct" location. The search engines not only know that 301 means "not duplicate content" it also eventually removes the old URL from the searches.

    Thus no duplicate content penalty. When I did a MT to WordPress migration, I 301'ed all the old URLs. After a couple of week the old URLs stopped showing up in searches completely and I removed the redirects.

    You just reminded me that "existing files and folders" don't include WP pages, because they don't exist until the WP rules below.

    True. To exclude those URLs you would need to explicitly put that as a condition to ignore those URLs.

    The .htaccess redirects are clever but they're not that smart. ;)

  13. SocialBlogsite
    Member
    Posted 2 years ago #

    But it's not damaging

    Yes, it is. I was talking about serving a page with /123. Those posts are showing that url in the canonical (not even /?p=123), instead of "/post-name". That's duplicate content.

    Which takes me to the subject of this post: If I can't get WP to generate the proper canonical, how do I PREVENT wordpress serving those pages? Those URL don't really exist, so if I can't canonicalize them or redirecting them to /post-name (AND showing /post-name in the address bar) , I'd prefer to deliver a 404.

    I haven't had time yet to understand exactly how a redirected page shows the requested or the final url in the address bar.

  14. I disagree but that's fine. Reasonable people can and do disagree sometimes.

    Give one of these a try.

    http://wordpress.org/extend/plugins/search.php?q=Disable+canonical

    If they do disable the canonical redirects then you should be able to 404 the incorrect URLs. That may directly solve it for you.

  15. SocialBlogsite
    Member
    Posted 2 years ago #

    Hehe, excuse my english, I was not opposing, but making sure we are talking about the same thing or figuring out if I missed something.

    The page called with /123 is showing the content for post-ID 123 correctly, but not showing /post-name either in the url or the canonical.

    Isn't that an issue?

    I think it's great to "predict" what the visitor tried to see, but there can't be different canonicals for the same content. All of them should point to /page-name. Shouldn't them?

    I don't want to disable the canonicals! I just want either:
    1) canonicals point to the right place
    2) Return /post-name in the url so no canonical is needed.
    3) disable /123 prediction at all if none of above is possible.

  16. Hehe, excuse my english

    Nope, not a problem. Excuse my lack of Spanish (assuming you speak Spanish) :D

    The page called with /123 is showing the content for post-ID 123 correctly, but not showing /post-name either in the url or the canonical.

    With the .htaccess rules I proposed it all works out, if you also apply it to outside of /blog too.

    URLs ending /123 will get 301 redirected to /?p=123. But that's not the canonical URL either so WordPress will 301 redirect that also to the correct URL /some-slug-here.

    So this is how it goes:

    A request for http://site/123 gets 301 redirected to http://site/?p=123 via [R=301,L].

    The browser then requests http://site/?p=123 and is once again 301 redirected to http://site/some-slug-here via [R=301,L].

    The browser then requests http://site/some-slug-here and the web page is delivered with a http status code of 200.

    It all works. The incorrect URLs return 301 and do not show up as duplicate content so no penalty.

  17. SocialBlogsite
    Member
    Posted 2 years ago #

    A request for http://site/123 gets 301 redirected to http://site/?p=123 via [R=301,L].

    The browser then requests http://site/?p=123 and is once again 301 redirected to http://site/some-slug-here via [R=301,L].

    The browser then requests http://site/some-slug-here and the web page is delivered with a http status code of 200.

    Yes, I understood your explanation before. That's why I got so exited :)
    But that's not happening.

    When I request
    http://biscaynebayfishing.com/blog/287

    It is successfully converted to
    http://biscaynebayfishing.com/287

    and then it shows me the right content for the post-ID 287, as if I had typed
    http://biscaynebayfishing.com/287 in my browser
    which was already working, thanks to WP guessing magic.

    and THEN…

    Unfortunately, the process ends there, and the url at the address bar is the above one, as well as the canonical. No slug / pretty permalink is retrieved / shown / canonicalized.

    The url and canonical link stays as
    http://biscaynebayfishing.com/287

    I need to fix that asap.

  18. It's working for me using a different post. I used this tool (first hit in the search engine) http://www.seoconsultants.com/tools/check-server-headers-tool/ to check the headers.

    When I visit http://www.biscaynebayfishing.com/blog/705 I get 301'ed to http://www.biscaynebayfishing.com/?p=705 like so

    1. Requesting: http://www.biscaynebayfishing.com/blog/705
    HEAD /blog/705 HTTP/1.1
    Connection: Keep-Alive
    Keep-Alive: 300
    Accept:*/*
    Host: http://www.biscaynebayfishing.com
    Accept-Language: en-us
    Accept-Encoding: gzip, deflate
    User-Agent: Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 6.0)

    Server Response:
    HTTP/1.1 301 Moved Permanently Date: Wed, 28 Mar 2012 16:25:13 GMT Server: Apache Location: http://www.biscaynebayfishing.com/?p=705 Content-Type: text/html; charset=iso-8859-1

    Visiting http://www.biscaynebayfishing.com/?p=705 I get 301'ed to http://www.biscaynebayfishing.com/miami-biscayne-bay-and-flamingo-fishing-reportsnook-and-bonefish:

    1. Requesting: http://www.biscaynebayfishing.com/?p=705
    HEAD /?p=705 HTTP/1.1
    Connection: Keep-Alive
    Keep-Alive: 300
    Accept:*/*
    Host: http://www.biscaynebayfishing.com
    Accept-Language: en-us
    Accept-Encoding: gzip, deflate
    User-Agent: Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 6.0)

    Server Response:
    HTTP/1.1 301 Moved Permanently Date: Wed, 28 Mar 2012 16:27:37 GMT Server: Apache X-Pingback: http://www.biscaynebayfishing.com/xmlrpc.php Location: http://www.biscaynebayfishing.com/miami-biscayne-bay-and-flamingo-fishing-reportsnook-and-bonefish Content-Type: text/html; charset=UTF-8

    And of course for http://www.biscaynebayfishing.com/miami-biscayne-bay-and-flamingo-fishing-reportsnook-and-bonefish I get a 200 status code.

    1. Requesting: http://www.biscaynebayfishing.com/miami-biscayne-bay-and-flamingo-fishing-reportsnook-and-bonefish
    HEAD /miami-biscayne-bay-and-flamingo-fishing-reportsnook-and-bonefish HTTP/1.1
    Connection: Keep-Alive
    Keep-Alive: 300
    Accept:*/*
    Host: http://www.biscaynebayfishing.com
    Accept-Language: en-us
    Accept-Encoding: gzip, deflate
    User-Agent: Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 6.0)

    Server Response:
    HTTP/1.1 200 OK Date: Wed, 28 Mar 2012 16:31:09 GMT Server: Apache X-Pingback: http://www.biscaynebayfishing.com/xmlrpc.php Link: ; rel=shortlink Set-Cookie: PHPSESSID=olbd9tqcutkvcji3vahmjiajk3; path=/ Vary: Accept-Encoding Content-Type: text/html; charset=UTF-8

    So no penalty. 301 -> 301 -> 200 is fine.

    Edit: Are there other rules above the one's I suggested? If they are they may be sending the request to a cached copy such as with WP Super Cache or W3TC.

  19. SocialBlogsite
    Member
    Posted 2 years ago #

    Are there other rules above the one's I suggested? If they are they may be sending the request to a cached copy such as with WP Super Cache or W3TC.

    No. Not other rules besides default WP below these.

    So no penalty. 301 -> 301 -> 200 is fine.

    Ohhh... I get what you mean. The problem is you are browsing a PAGE, not a POST.
    /123 works different for pages/posts.

    I was testing a post page, that were under /blog/
    Pages didn't move, just dropped the .html

    My test shows:
    /blog/287 301 -> /?p=287
    /?p=287 301 -> /287
    /287 301 -> /287
    … and that's a dead end.

  20. SocialBlogsite
    Member
    Posted 2 years ago #

    Jan:

    I just removed ALL the htaccess rules (other than WP's) and waited many hours, even removed my computer's DNS cache, to be double sure and confirmed they are not working (.html files are not found) and:

    ALL OUR RULES WERE DOING NOTHING.

    All the above behaviors remain the same:
    Pages like /?p=287 remain being redirected to /287
    /287 keeps redirecting to itself.
    canonicals remain not pointing to /post-name

    So, excuse me, but I'll post this again. When others see a topic like this they think I'm being taken care of and won't get involved in a discussion at this stage.
    Thanks for your time.

  21. ...You are being taken care of and (as you know) with each post the conversation gets visibility and people do see and read this...

    The rules I proposed were tested and do work 100% for posts. The reply I left above not only indicates that but provides evidence of such.

    I deal with a lot of compliance and audit issues, evidence is my friend. ;)

    But those .htaccess rules are intentionally designed for handling post IDs and do not do anything for page IDs. For those you would need to craft additional RewriteCond to handle those page IDs.

    I really do think the issue you are have is with WordPress canonical URLs. That's a feature and can be disabled with a plugin.

    http://wordpress.org/extend/plugins/search.php?q=disable+canonical

    Try one of those, that should prevent any further guessing and either produce a post or a 404.

  22. SocialBlogsite
    Member
    Posted 2 years ago #

    esmi:

    That's it. Stop it guys.

    YOU KNOW long discussions like this are not of interest for nobody anymore, and will be forgotten until some day in a few MONTHS some desperate user with the same LITTLE idea of how to fix it (and help me) will probably contribute with ANOTHER QUESTION or bump… and you STILL FORCE ME TO STAY THERE? (closing the new one I just opened).

    And then, when I DO want to leave a discussion open ON PURPOSE to get another opinion other than the ones from the same 2 moderators who exhausted their power to "convince me" of their point of view or shut up… you CLOSE IT?

    So… why don't you just give us your script so we can all help you to run the show you want in "your" forum and we save some time?
    That way I can post all happy WP features, help others to shut up or submit a quiet ticket nobody notices, or convince them they should work as you programmed and no change is good…

    I'm tired of moderator's abuse. If you don't like my questions and points of view, LEAVE THEM OPEN. You are not the only sources of knowledge here. Even less the only source of opinion diversity.

    If you are SO TIGHT about rules, OK, DELETE THIS ONE, and reopen the other. This is useless, and if you consider it's of use for others, then DON'T FORCE ME to take it as the solution to my problem, and move to ANOTHER POST that better describes what I need.

    Let me guess: This issue was already found in WP core and you prefer nobody talk about it until fixed, right?

  23. SocialBlogsite
    Member
    Posted 2 years ago #

    Dan:
    I'm not saying the rules don't work. Regex are ment to work.
    I'm saying THEY DON'T DO A THING in a WordPress installation with no plugins.

    I just tested and re-tested the blog and everything works THE SAME without your rules. If you have another idea of something I can reset/clear to triple-test they are not still being applied… as I said, I REMOVED them all, and all works the same. Urls are guessed, even inside blog/, so your (and mine) rules are useless.

    The whole problem seems to be completely un-related to the rewrite rules.
    I'll try to remove the canonical links, but I'll create my own, pointing to where they should. If I have to do that, it means WP canonical is a joke and won't work.

  24. esmi
    Forum Moderator
    Posted 2 years ago #

    Sorry? I've not taken any part in this discussion.

  25. SocialBlogsite
    Member
    Posted 2 years ago #

    esmi: Nobody said you did.
    Close this topic if you don't want to have two for "the same" issue, and re-open the other please. This one is finished for me, and I need to address the issue in a different way, as the other post said.

  26. esmi
    Forum Moderator
    Posted 2 years ago #

    Nobody said you did.

    Oh, sorry. I assumed you meant me when you mentioned "2 moderators". I can only see on other person involved in this discussion apart from yourself.

  27. You mentioned esmi in this post which is why everyone was confused.

    We closed this topic: http://wordpress.org/support/topic/stop-wordpress-guessing-urls?replies=2 It was redundant. You may want to address it in a 'different' way, but for us to be able to help you, we need to see the whole story. If you'd like to rephrase in a post here, please do, but that's how we roll.

    More specifically:
    Given /blog setup as the "blog" page (WP installed in root)

    Page -> takes me to
    /blog/695 -> /blog/ (the blog page)
    /695 -> 404 not found

    What are your permalink rules?

  28. Also keep in mind that everyone here is honestly trying to assist you and NOT frustrate you. I am sure you understand that already, but it is something that is worth repeating.

    Please be patient while I repeat myself some more.

    I honestly believe that you should avail yourself of one of these plugins and just turn off canonical URLs entirely.

    http://wordpress.org/extend/plugins/search.php?q=disable+canonical

    That will prevent WordPress from guessing and redirecting a browser to what it thinks is the right URL. The reader will either have the correct permalink or they'll get a 404.

    If you do not wish to go that route then you can accomplish what you've described using .htaccess rewrites.

    The .htaccess additions are predicated on mod_rewrite being installed and working. As your installation has fancy permalinks, that requirement is met for you.

    Those rewrite rules will work on a bare WordPress installation without a single plugin required. That's how I tested those rules on my test WordPress installation.

    Those .htaccess rules I proposed will work for any post ID and with a little modification they can be applied to /123 as well. But those rules I wrote are intended /?p=123 which is the post id parameter.

    Pages use the /?page_id=123 parameter. If there is a post /?p=123 and a page /?page_id=123 then no amount of mod_rewrite rules will solve that one.

    If you have a page ID such as 567 and want /567 to go to that page, then that's just another rewrite rule above the rest to handle that redirect to /?page_id=567.

    I hope that this helps and you are able to accomplish what you are trying to do.

  29. SocialBlogsite
    Member
    Posted 2 years ago #

    Jan: Thanks for your help
    You are helping, and those I'm talking about, they know it's about them. It's mostly moderators/programmers who have both power to shut up people and the workload when others find each-other with the same bugs.

    And this is not an exception: I FIXED IT.

    It's a known bug, but seems easier to write two lines and close my topics than sending a link to the bug.

    The problem is post_name values are NOT updated for all the existing posts when you switch from ugly to pretty permalinks. I don't find so bad to take care of the urgent emerging issues fist and the rest later, but covering up and abusing of moderation power really bothers me.

    So it works as far as your posts were created AFTER your (pretty) permalink settings.

    I figured it out trying to make my own permalinks, and nothing worked for get permalink, nor post->post_name, nor nothing. AND since the comparisson between the requested url and the returned one fails to match because post_name doesn't have a pretty permalink, even single posts and pages included a canonical link.

    Now every plugin for canonical links work, but they are not needed, because since WP 2.9 they come in the box.

    For every user that was ignored and told the missing slug updating "is not a bug" and WP works "as intended", instead of give help and admit should be done so other things (and plugins) work, here's the plugin you need to run to patch the WP hole and re-generate them:

    http://wordpress.org/extend/plugins/rp-recreate-slugs/

  30. SocialBlogsite
    Member
    Posted 2 years ago #

    By the way.
    This solution would have been found easier in my new closed topic:
    Close to the top, or in some STICKY or well named topic which nobody did yet.

    Here it's buried under 20+ posts about what didn't fixed it. If you are a internaut you know just a few get down here.

    Oh, wait… such a straightforward topic would alert of something not working in WP, and probably will be quickly deleted or never posted.

    WP community is priceless. Passing the moderators to get to them, costs an arm and a leg.

Topic Closed

This topic has been closed to new replies.

About this Topic