Support » Plugin: Legacy URL Forwarding » [Plugin: Legacy URL Forwarding] Lowercase permalinks

  • Resolved kitchin

    (@kitchin)


    I am building on this plugin to handle converting legacy sites with mixed-case permalinks to WordPress lowercase permalinks. The old site was Drupal, and I want to redirect the old mixed-case URL’s to lowercase without making 404’s. For most pages, WordPress looks up the slug case-insensitively and everything works. But I have found at least two cases where WP fails:

    1. It keys on the last part of the slug:
    url:  http://www.example.com/Important-Event/press/
    get:  http://www.example.com/press/
    want: http://www.example.com/important-event/press/
    
    2. It does not handle mapped domains well:
    url:  http://www.example2.com/events/Another-Event/
    get:  404
    want: http://www.example.com/events/another-event/

    The approach of OllyBenson’s plugin Legacy URL Forwarding is to hook into the filter/action ‘404_template’. It’s a pretty simple 3-line plugin that allows editors to put the legacy URL in the page’s meta tags, which the the plugin looks up to make a redirect. I extended the plugin to always do a get_permalink() lookup of the lowercase of the slug, when not already lowercase. As you can see, I am trying to handle this problem without using mod_rewrite, that is, without changing .htaccess.

    Extending OllyBenson’s plugin works for #2 above, but not #1, because that one never hits 404. Both the shorter and longer slugs exist. For #1, I use the filter hook ‘redirect_canonical’ to make the same substitution as in ‘404_template’.

    On the other hand, #2 needs ‘404_template’, because even if I set the right URL in ‘redirect_canonical’, WP still goes to 404. This seems like a bug.

    So both hooks are needed.

    The URL rewriting code in WP is complex. The latter steps are in “wp-includes/canonical.php”, which includes the ‘redirect_canonical’ hook. The function there even recurses once. But by this point, much URL processing has already happened. In fact, hundreds of preg_match()’s have been tried! So it would be more efficient to fix the URL earlier than “wp-includes/canonical.php”, but the code before that is even more complex, in “wp-includes/class-wp.php”. It would make sense to fix all this before reaching 404, but ‘redirect_canonical’ is not soon enough.

    The zillions of WP preg_match()’s happen on every URL I have checked, not just mismatched URL’s with problems. They only stop if a match is found. The list of regexs is in $wp_rewrite->rules, and seems to include five regexes for every page on the site! Here is how the list starts:

    category/(.+?)/feed/(feed|rdf|rss|rss2|atom)/?$
    category/(.+?)/(feed|rdf|rss|rss2|atom)/?$
    category/(.+?)/page/?([0-9]{1,})/?$
    category/(.+?)/?$
    ...
    robots\.txt$
    ...
    .*wp-feed.php$
    .*wp-commentsrss2.php$

    which is reasonable enough. But then every page on the site has these five, which are executed on every URL!

    (important-event/press)/trackback/?$
    (important-event/press)/feed/(feed|rdf|rss|rss2|atom)/?$
    (important-event/press)/(feed|rdf|rss|rss2|atom)/?$
    (important-event/press)/page/?([0-9]{1,})/?$
    (important-event/press)/comment-page-([0-9]{1,})/?$
    (important-event/press)(/[0-9]+)?/?$

    Any attachment on the page doubles the number of its regexes in the list.

    This is crazy tunes and anyone could see how to short circuit it. Anyway, I decided not to try to hook in early. Since every page lookup is doing this stuff, there’s not much point in avoiding it just for my rewrites.

    So I am using the two hooks ‘404_template’ and ‘redirect_canonical’.

    Next steps:
    Test out WP 3.3 release candidate to see if the code has improved.
    Release the code if anybody wants it.

    Seems like rewriting mixed-case URL’s (correctly) would have already be a plugin, but my search didn’t find it.

    http://wordpress.org/extend/plugins/url-forwarding/

Viewing 3 replies - 1 through 3 (of 3 total)
  • kitchin

    (@kitchin)

    Specific bug fixes for Olly Benson’s ver. 1.3 of http://wordpress.org/extend/plugins/url-forwarding/ see also his http://code.olib.co.uk/redirect

    1. [bug] add_filter(‘redirect_canonical’,…) and do the same logic as the ‘404_template’ hook, but return the result instead of doing a wp_redirect(). Or maybe doing the wp_redirect() would mean the 404 hook is not needed. I did not try that.

    2. [feature] Option to look for other domains in the URL.

    3. [feature] Lowercase default lookups as described above. In other words, even if no meta tags have been set, the plugin would detect mixed case slugs and try to fix them.

    kitchin

    (@kitchin)

    WP 3.3, just released, is supposed to speed up permalink performance. I have yet to investigate, but this looks hopeful: “Eliminate verbose rewrite rules for ambiguous rewrite structures, resulting in massive performance gains.” http://codex.wordpress.org/Version_3.3
    Will need to test the hooks are still hitting right.

    Guess what? WP3.3 fixed ALL these issues. 🙂

Viewing 3 replies - 1 through 3 (of 3 total)
  • The topic ‘[Plugin: Legacy URL Forwarding] Lowercase permalinks’ is closed to new replies.