WordPress.org

Ready to get started?Download WordPress

Forums

[Plugin: WP Super Cache] Big Problem With Trailing Slash - Create Duplicate Content !! (13 posts)

  1. definitelynot
    Member
    Posted 6 years ago #

    Hello,

    There is a problem with trailing slashes. I will try to explain in a few steps (I have wp-supercache 0.6.7 installed):
    1. we all know that http://www.example.com/page is not the same with http://www.example.com/page/ (google knows that too and may consider this duplicate content)
    2. the default behavior of wordpress for missing trailing slash is to 302 redirect (temporarily redirect) the http://www.example.com/page to http://www.example.com/page/
    3. after installing wp-supercache, it seems that http://www.example.com/page is the same with http://www.example.com/page/ because if I request (from cache), let's say http://www.example.com/page (without trailing slash) it returns (from cache) the same copy of the page http://www.example.com/page/ (with trailing slash) instead of 302 redirecting it to http://www.example.com/page/
    4. what happens next ? google will index both pages with 100% duplicate content witch is very wrong.

    All this can be tested using this tool and any post from this wp-supercache enabled sites: stylefrizz and weburbanist.

    Can this be fixed ?

  2. Donncha O Caoimh
    Member
    Posted 6 years ago #

    Unfortunately Apache automatically adds a slash to directories and since the pages are basically static files that's what's happening here. You could try switching to "Half on" mode instead.

  3. definitelynot
    Member
    Posted 6 years ago #

    I'm afraid that I didn't explain very well. Every blog using wp-supercache have this problem. Let's take an example from your blog (http://ocaoimh.ie/2007/11/03/where-have-all-the-wispas-gone/)

    I do :
    GET /2007/11/03/where-have-all-the-wispas-gone/ HTTP/1.1
    (with trailing slash) and I get:
    HTTP Status Code: HTTP/1.1 200 OK

    Next I do:
    GET /2007/11/03/where-have-all-the-wispas-gone HTTP/1.1
    (without trailing slash) and I get:
    HTTP Status Code: HTTP/1.1 200 OK

    This means that you have 2 pages on your blog with 100% duplicate content.

    Default behavior of wordpress is to 302 redirect the non trailing slash page to the trailing slash one like in the following example (I will use http://www.onlyaphoto.com/conceptual-photography/horse-power/ because this blog is not using wp-supercache:

    GET /conceptual-photography/horse-power/ HTTP/1.1
    returns:
    HTTP Status Code: HTTP/1.1 200 OK

    and
    GET /conceptual-photography/horse-power HTTP/1.1 (without trailing slash)
    returns:
    HTTP Status Code: HTTP/1.1 302 Moved Temporarily
    witch is correct.
    This way, google will never find 2 pages with the same content.

    Please, make your own tests and maybe you can find a fix for this problem. Practically every blog using wp-supercache has this problem.

    Thank you

  4. andylav
    Member
    Posted 6 years ago #

    hi there,
    would the following additions the rewrite conditions in the .htaccess file solve these problems?

    RewriteCond %{REQUEST_URI} !^.*/$
    RewriteCond %{REQUEST_URI} !^.*//.*$

    in theory this should reject URLs that contain either a missing or multiple trailing slashes, so the user shouldn't receive a 'super cached' page, and wordpress should then handle the 301/302 redirect.

    thanks,
    andy

  5. andylav
    Member
    Posted 6 years ago #

    as my rewriting skills are a bit lacking, the top line perhaps should be:
    RewriteCond %{REQUEST_URI} !^.*[^/]$

  6. Donncha O Caoimh
    Member
    Posted 6 years ago #

    Andy - good idea. The second rule filters out any url with "//" in the request, doesn't it? I see that redirects too. Nice job!

    definitelynot - thanks for reporting the problem, and persisting in explaining it!

    Google is very good at figuring out what the real urls of a page are but I'll add a fix for this. Unfortunately it's not as simple as adding those two rules because some people actually use urls without trailing slashes, the fiends! :)

  7. ma2t
    Member
    Posted 6 years ago #

    This is similar to an issue I brought up a week ago.

    http://wordpress.org/support/topic/196538?replies=7 , it also caches direct IP address access and non www.

  8. mactac
    Member
    Posted 5 years ago #

    have these 2 issues been fixed? (the slashes and the www/non-www)

  9. Donncha O Caoimh
    Member
    Posted 5 years ago #

    mactac - yes.

  10. mactac
    Member
    Posted 5 years ago #

    It which version? I just tested mine and it's not fixed...

  11. Donncha O Caoimh
    Member
    Posted 5 years ago #

    It's been fixed for some time. See http://ocaoimh.ie/2009/02/10/press-play-on-tape-retro-action - it redirects to the url with the slash at the end.
    http://www.ocaoimh.ie/2009/02/10/press-play-on-tape-retro-action/ redirects and removes the www. using the normal WordPress redirection code.

    Make sure your .htacess rules have these lines. They do the ending slash thing and should have been added automatically. If you changed permalink structure since writing the rules that might be why they're not there.

    RewriteCond %{REQUEST_URI} !^.*[^/]$
    RewriteCond %{REQUEST_URI} !^.*//.*$

  12. mactac
    Member
    Posted 5 years ago #

    no, I checked your example:
    http://ocaoimh.ie/2009/02/10/press-play-on-tape-retro-action

    That's a 302 redirect.

    It's *not* fixed.

  13. Donncha O Caoimh
    Member
    Posted 5 years ago #

    That's WordPress. File a bug in trac at http://trac.wordpress.org/ but also check out the recent wp-hackers thread on this. The archives are online somewhere as this was trashed out there too. It's not a supercache bug. Nothing to do with this plugin at all.

Topic Closed

This topic has been closed to new replies.

About this Topic