• Resolved kitchin

    (@kitchin)


    I am building on this plugin to handle converting legacy sites with mixed-case permalinks to WordPress lowercase permalinks. The old site was Drupal, and I want to redirect the old mixed-case URL’s to lowercase without making 404’s. For most pages, WordPress looks up the slug case-insensitively and everything works. But I have found at least two cases where WP fails:

    1. It keys on the last part of the slug:
    url:  http://www.example.com/Important-Event/press/
    get:  http://www.example.com/press/
    want: http://www.example.com/important-event/press/
    
    2. It does not handle mapped domains well:
    url:  http://www.example2.com/events/Another-Event/
    get:  404
    want: http://www.example.com/events/another-event/

    The approach of OllyBenson’s plugin Legacy URL Forwarding is to hook into the filter/action ‘404_template’. It’s a pretty simple 3-line plugin that allows editors to put the legacy URL in the page’s meta tags, which the the plugin looks up to make a redirect. I extended the plugin to always do a get_permalink() lookup of the lowercase of the slug, when not already lowercase. As you can see, I am trying to handle this problem without using mod_rewrite, that is, without changing .htaccess.

    Extending OllyBenson’s plugin works for #2 above, but not #1, because that one never hits 404. Both the shorter and longer slugs exist. For #1, I use the filter hook ‘redirect_canonical’ to make the same substitution as in ‘404_template’.

    On the other hand, #2 needs ‘404_template’, because even if I set the right URL in ‘redirect_canonical’, WP still goes to 404. This seems like a bug.

    So both hooks are needed.

    The URL rewriting code in WP is complex. The latter steps are in “wp-includes/canonical.php”, which includes the ‘redirect_canonical’ hook. The function there even recurses once. But by this point, much URL processing has already happened. In fact, hundreds of preg_match()’s have been tried! So it would be more efficient to fix the URL earlier than “wp-includes/canonical.php”, but the code before that is even more complex, in “wp-includes/class-wp.php”. It would make sense to fix all this before reaching 404, but ‘redirect_canonical’ is not soon enough.

    The zillions of WP preg_match()’s happen on every URL I have checked, not just mismatched URL’s with problems. They only stop if a match is found. The list of regexs is in $wp_rewrite->rules, and seems to include five regexes for every page on the site! Here is how the list starts:

    category/(.+?)/feed/(feed|rdf|rss|rss2|atom)/?$
    category/(.+?)/(feed|rdf|rss|rss2|atom)/?$
    category/(.+?)/page/?([0-9]{1,})/?$
    category/(.+?)/?$
    ...
    robots\.txt$
    ...
    .*wp-feed.php$
    .*wp-commentsrss2.php$

    which is reasonable enough. But then every page on the site has these five, which are executed on every URL!

    (important-event/press)/trackback/?$
    (important-event/press)/feed/(feed|rdf|rss|rss2|atom)/?$
    (important-event/press)/(feed|rdf|rss|rss2|atom)/?$
    (important-event/press)/page/?([0-9]{1,})/?$
    (important-event/press)/comment-page-([0-9]{1,})/?$
    (important-event/press)(/[0-9]+)?/?$

    Any attachment on the page doubles the number of its regexes in the list.

    This is crazy tunes and anyone could see how to short circuit it. Anyway, I decided not to try to hook in early. Since every page lookup is doing this stuff, there’s not much point in avoiding it just for my rewrites.

    So I am using the two hooks ‘404_template’ and ‘redirect_canonical’.

    Next steps:
    Test out WP 3.3 release candidate to see if the code has improved.
    Release the code if anybody wants it.

    Seems like rewriting mixed-case URL’s (correctly) would have already be a plugin, but my search didn’t find it.

    http://wordpress.org/extend/plugins/url-forwarding/

Viewing 3 replies - 1 through 3 (of 3 total)
  • Thread Starter kitchin

    (@kitchin)

    Specific bug fixes for Olly Benson’s ver. 1.3 of http://wordpress.org/extend/plugins/url-forwarding/ see also his http://code.olib.co.uk/redirect

    1. [bug] add_filter(‘redirect_canonical’,…) and do the same logic as the ‘404_template’ hook, but return the result instead of doing a wp_redirect(). Or maybe doing the wp_redirect() would mean the 404 hook is not needed. I did not try that.

    2. [feature] Option to look for other domains in the URL.

    3. [feature] Lowercase default lookups as described above. In other words, even if no meta tags have been set, the plugin would detect mixed case slugs and try to fix them.

    Thread Starter kitchin

    (@kitchin)

    WP 3.3, just released, is supposed to speed up permalink performance. I have yet to investigate, but this looks hopeful: “Eliminate verbose rewrite rules for ambiguous rewrite structures, resulting in massive performance gains.” http://codex.wordpress.org/Version_3.3
    Will need to test the hooks are still hitting right.

    Thread Starter kitchin

    (@kitchin)

    Guess what? WP3.3 fixed ALL these issues. 🙂

Viewing 3 replies - 1 through 3 (of 3 total)
  • The topic ‘[Plugin: Legacy URL Forwarding] Lowercase permalinks’ is closed to new replies.