WordPress.org

Ready to get started?Download WordPress

Forums

RewriteRules for old .html pages, can't be 301 (lesbian) (4 posts)

  1. Oroboros2
    Member
    Posted 1 year ago #

    I have a fairly good grasp of mod_rewrite and have been attempting to remap some old .html files back to a new WordPress page.

    The main problem I have is that a 301 redirect won't work for the client. There was extensive SEO done to the old URL by someone other than me. A 301 redirect works just fine to get human visitors to the right page, but the search engines have dropped their page rankings and the traffic has died off along with sales of their lesbian novels.

    Most specifically I need to internally serve /lesbian-novels.html as /lesbian-novels/ which is a valid WordPress page.

    When I got to the problem there was the Simple Redirect Plugin installed on top of some .htaccess 301 Redirects.

    Case #1 has the default .htaccess rules and all redirection plugins disabled. I wrote a custom 404 function to dump all relevant environmental variables containing the word 'lesbian' which is what this first output is:

    $ lynx -dump http://www.susangabriel.com/lesbian-novels.html
    ...
       REQUEST_URI         /lesbian-novels.html
       REDIRECT_SCRIPT_URL /lesbian-novels.html
       REDIRECT_SCRIPT_URI http://www.susangabriel.com/lesbian-novels.html
       SCRIPT_URL          /lesbian-novels.html
       SCRIPT_URI          http://www.susangabriel.com/lesbian-novels.html
       REDIRECT_URL        /lesbian-novels.html
    $ lynx -head -dump http://www.susangabriel.com/lesbian-novels.html
    HTTP/1.0 404 Not Found
    Date: Sun, 30 Dec 2012 23:59:04 GMT
    Server: Apache
    X-Powered-By: PHP/5.3.15
    P3P: CP="IDC DSP COR ADM DEVi TAIi PSA PSD IVAi IVDi CONi HIS OUR IND CNT"
    Set-Cookie: Cart66DBSID=KAYB464P33I10BGZIZ9V201OGZP0NA9IK8THDQEB; path=/
    X-Pingback: http://www.susangabriel.com/xmlrpc.php
    Expires: Wed, 11 Jan 1984 05:00:00 GMT
    Cache-Control: no-cache, must-revalidate, max-age=0
    Pragma: no-cache
    Connection: close
    Content-Type: text/html; charset=UTF-8

    Basically this is the default .htaccess for the chosen permalink structure /%category%/%postname%/

    $ grep -v ^# .htaccess
    AcceptPathInfo On
    Options +FollowSymlinks
    RewriteEngine On
    RewriteBase /
    RewriteRule ^index\.php$ - [L]
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule . /index.php [L]

    Case #2 I added simple added rule to rewrite requests for /lesbian-novels.html to /lesbian-novels/ which is a valid URL within WordPress:

    $ lynx -head -dump http://www.susangabriel.com/lesbian-novels/
    HTTP/1.0 200 OK
    Date: Mon, 31 Dec 2012 00:12:04 GMT
    Server: Apache
    X-Powered-By: PHP/5.3.15
    P3P: CP="IDC DSP COR ADM DEVi TAIi PSA PSD IVAi IVDi CONi HIS OUR IND CNT"
    Set-Cookie: Cart66DBSID=VVIZ2CIUCYNWZKICOUSHA06UN2LTKZ351REKVC8C; path=/
    X-Pingback: http://www.susangabriel.com/xmlrpc.php
    Connection: close
    Content-Type: text/html; charset=UTF-8
    $ cat .htaccess
    AcceptPathInfo On
    Options +FollowSymlinks
    RewriteEngine On
    RewriteBase /
    RewriteRule ^index\.php$ - [L]
    
    RewriteCond %{REQUEST_URI} lesbian\-novels\.html
    RewriteRule ^(.*)\.html$ /$1/ [PT]
    # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    # Pass-through the rewrite so it gets processed
    # as /lesbian-novels/
    
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule . /index.php [L]
    $  lynx -dump http://www.susangabriel.com/lesbian-novels.html
    ...
       REQUEST_URI                  /lesbian-novels.html
       REDIRECT_REDIRECT_SCRIPT_URL /lesbian-novels.html
       REDIRECT_REDIRECT_SCRIPT_URI
       http://www.susangabriel.com/lesbian-novels.html
       REDIRECT_SCRIPT_URL          /lesbian-novels.html
       REDIRECT_SCRIPT_URI          http://www.susangabriel.com/lesbian-novels.html
       SCRIPT_URL                   /lesbian-novels.html
       SCRIPT_URI                   http://www.susangabriel.com/lesbian-novels.html
       REDIRECT_URL                 /lesbian-novels/

    So by setting the PT flag, I successfully changed the REDIRECT_URL from /lesbian-novels.html to /lesbian-novels/ and in the process created a couple new environmental variables, REDIRECT_REDIRECT_SCRIPT_URL and REDIRECT_REDIRECT_SCRIPT_URI which don't seem to do me any real good.

    Case #3, I tried setting some environmental variables in the RewriteRule:

    $ grep -v ^# .htaccess
    AcceptPathInfo On
    Options +FollowSymlinks
    RewriteEngine On
    RewriteBase /
    RewriteRule ^index\.php$ - [L]
    RewriteCond %{REQUEST_URI} lesbian\-novels\.html
    RewriteRule ^(.*)\.html$ /$1/ [PT,E=PATH_INFO:/$1/,E=REQUEST_URI:/$1/,E=REDIRECT_REDIRECT_SCRIPT_URL:/$1/,E=REDIRECT_SCRIPT_URL:/$1/,E=SCRIPT_URL:/$1/]
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule . /index.php [L]
    $ lynx -dump http://www.susangabriel.com/lesbian-novels.html
    ...
    404 Not Found
    
       REQUEST_URI /lesbian-novels.html
       REDIRECT_REDIRECT_SCRIPT_URL /lesbian-novels/
       REDIRECT_REDIRECT_SCRIPT_URI
       http://www.susangabriel.com/lesbian-novels.html
       REDIRECT_REDIRECT_PATH_INFO /lesbian-novels/
       REDIRECT_REDIRECT_REQUEST_URI /lesbian-novels/
       REDIRECT_REDIRECT_REDIRECT_REDIRECT_SCRIPT_URL /lesbian-novels/
       REDIRECT_REDIRECT_REDIRECT_SCRIPT_URL /lesbian-novels/
       REDIRECT_SCRIPT_URL /lesbian-novels/
       REDIRECT_SCRIPT_URI http://www.susangabriel.com/lesbian-novels/
       SCRIPT_URL /lesbian-novels/
       SCRIPT_URI http://www.susangabriel.com/lesbian-novels/
       REDIRECT_URL /lesbian-novels/

    And that's where I'm stuck. I've tried various flags in the RewriteRule to get the desired behavior but everything kicks back a 404 unless I use a 301 redirect which doesn't solve the client's AWOL SEO linkage.

  2. Oroboros2
    Member
    Posted 1 year ago #

    So it seems that when the request for /lesbian-novels.html gets through the final chain of rules and is handled by index.php, WordPress is then looking back at REQUEST_URI and because it doesn't recognize "lesbian-novels.html" it 404s. I'm pretty sure I'm not allowed to change that values, since I tried in my RewriteRule and it worked for the other values:

    E=REQUEST_URI:/$1/

    I think this means I cannot Rewrite except by 301 redirect, and my client is just S.O.L. unless they want to put their static page back up again, so the search engines are happy with a 200 result code since 301 seems not OK for ranking purposes anymore.

    I tried the other obvious fix: creating a WordPress page named "lesbian-novels.html". It *almost* does what I need it to....

    $ lynx -head -dump http://www.susangabriel.com/lesbian-novels.html
    HTTP/1.0 301 Moved Permanently
    ...
    Location: http://www.susangabriel.com/lesbian-novels.html/
    $ lynx -head -dump http://www.susangabriel.com/lesbian-novels.html/
    HTTP/1.0 200 OK
  3. That's weird. A 301 redirect works the same for humans as cylons-- I mean Google Bots.

    Most specifically I need to internally serve /lesbian-novels.html as /lesbian-novels/ which is a valid WordPress page.

    Redirect 301 /lesbian-novels.html /lesbian-novels/

    Really that's it. If you want to redirect all .htmls, we can do that too.

  4. Oroboros2
    Member
    Posted 1 year ago #

    In my searches I came across advice against mixing mod_alias and mod_rewrite and that seems reasonable to me as a general rule. In this simple case I see no real danger in doing it that way either... Thanks for the suggestion Mika. Your way even has an advantage because those redirects are more easily separated in the .htaccess from the block added by WordPress

    I questioned whether the drop-off in traffic and sales (which I'm told returns if the old .html file is restored) was due to some oddity in the way that the 301 was being sent. I tested out a variety of possible redirects. So far we haven't been able to return the traffic with a 301 as we can with a 200.

    It may be that they allowed the URL to be 404 for too long before getting the 301 in place.

    I did this inside mod_rewrite because I wanted to see the order visually that things were processed and ensure the rewrites happen before the final WordPress handler kicks in for all missing files/folders:

    # BEGIN WordPress
    <IfModule mod_rewrite.c>
    RewriteEngine On
    RewriteBase /
    RewriteRule ^index\.php$ - [L]
    
    # Begin Mike's additions
    RewriteCond %{REQUEST_URI} index\.htm$
    RewriteRule . http://www.susangabriel.com [L,R=301]
    RewriteCond %{REQUEST_URI} wildflower\.html$
    RewriteRule . http://www.susangabriel.com/read/wildflower/ [L,R=301]
    RewriteCond %{REQUEST_URI} seekingsarasummers\.html$
    RewriteRule . http://www.susangabriel.com/read/seekingsarasummers/ [L,R=301]
    RewriteCond %{REQUEST_URI} \.html$
    RewriteRule ^(.*)\.html$ http://www.susangabriel.com/$1/ [L,R=301]
    # End Mike's additions
    
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule . /index.php [L]
    </IfModule>
    
    # END WordPress

    I did that and documented it knowing there's a chance it will get overwritten again in the future. Now I'm thinking of going back to the mod_alias 301 you suggest (I also advised the client I saw no reason the 301 shouldn't be working, but had observed oddities in how opensiteexplorer.com viewed the URLs claiming a 404 when I was getting 301).

    I did find some cosmetic differences in the 301 that get sent through various entities. For example:

    The Smart 404 plugin sends "HTTP/1.0 301 Moved Permanently", "Connection:
    close" and "Content-Type: text/html; charset=UTF-8" (tried this on a whim).

    $ lynx -head -dump http://www.susangabriel.com/lesbian-novels.html
    HTTP/1.0 301 Moved Permanently
    Date: Sat, 29 Dec 2012 22:59:15 GMT
    Server: Apache
    X-Powered-By: PHP/5.3.15
    P3P: CP="IDC DSP COR ADM DEVi TAIi PSA PSD IVAi IVDi CONi HIS OUR IND CNT"
    Set-Cookie: Cart66DBSID=IXINGRMMBT244H9MHGXRNI17HYTSEJ76EDVVGCVM; path=/
    X-Pingback: http://www.susangabriel.com/xmlrpc.php
    Expires: Wed, 11 Jan 1984 05:00:00 GMT
    Last-Modified: Sat, 29 Dec 2012 22:59:16 GMT
    Cache-Control: no-cache, must-revalidate, max-age=0
    Pragma: no-cache
    Location: http://www.susangabriel.com/lesbian-novels/
    Connection: close
    Content-Type: text/html; charset=UTF-8

    The Simple 301 Redirects plugin sends "HTTP/1.1 301 Moved Permanently" but
    omits "Connection: close" and doesn't specify a character set in the
    Content-Type (this is the redirect method in effect when I came on to look at the problem).

    $ lynx -head -dump http://www.susangabriel.com/lesbian-novels.html
    HTTP/1.1 301 Moved Permanently
    Date: Sat, 29 Dec 2012 22:58:07 GMT
    Server: Apache
    X-Powered-By: PHP/5.3.15
    Location: http://www.susangabriel.com/lesbian-novels/
    Content-Type: text/html

    I figure the apache .htaccess redirect is the simplest and most recognized
    way to redirect. This method sends the "HTTP/1.1 301 Moved
    Permanently" header with "Connection: close" and "Content-Type:
    text/html; charset=iso-8859-1" (I switched to this method and it is currently active using .htaccess shown above, but no return in sales)

    $ lynx -head -dump http://www.susangabriel.com/lesbian-novels.html
    HTTP/1.1 301 Moved Permanently
    Date: Sat, 29 Dec 2012 22:56:21 GMT
    Server: Apache
    Location: http://www.susangabriel.com/lesbian-novels/
    Connection: close
    Content-Type: text/html; charset=iso-8859-1

    You can tell that WordPress never saw that last request because no cookies.

Topic Closed

This topic has been closed to new replies.

About this Topic