[Plugin: Legacy URL Forwarding] Lowercase permalinks
-
I am building on this plugin to handle converting legacy sites with mixed-case permalinks to WordPress lowercase permalinks. The old site was Drupal, and I want to redirect the old mixed-case URL’s to lowercase without making 404’s. For most pages, WordPress looks up the slug case-insensitively and everything works. But I have found at least two cases where WP fails:
1. It keys on the last part of the slug: url: http://www.example.com/Important-Event/press/ get: http://www.example.com/press/ want: http://www.example.com/important-event/press/ 2. It does not handle mapped domains well: url: http://www.example2.com/events/Another-Event/ get: 404 want: http://www.example.com/events/another-event/
The approach of OllyBenson’s plugin Legacy URL Forwarding is to hook into the filter/action ‘404_template’. It’s a pretty simple 3-line plugin that allows editors to put the legacy URL in the page’s meta tags, which the the plugin looks up to make a redirect. I extended the plugin to always do a get_permalink() lookup of the lowercase of the slug, when not already lowercase. As you can see, I am trying to handle this problem without using mod_rewrite, that is, without changing .htaccess.
Extending OllyBenson’s plugin works for #2 above, but not #1, because that one never hits 404. Both the shorter and longer slugs exist. For #1, I use the filter hook ‘redirect_canonical’ to make the same substitution as in ‘404_template’.
On the other hand, #2 needs ‘404_template’, because even if I set the right URL in ‘redirect_canonical’, WP still goes to 404. This seems like a bug.
So both hooks are needed.
The URL rewriting code in WP is complex. The latter steps are in “wp-includes/canonical.php”, which includes the ‘redirect_canonical’ hook. The function there even recurses once. But by this point, much URL processing has already happened. In fact, hundreds of preg_match()’s have been tried! So it would be more efficient to fix the URL earlier than “wp-includes/canonical.php”, but the code before that is even more complex, in “wp-includes/class-wp.php”. It would make sense to fix all this before reaching 404, but ‘redirect_canonical’ is not soon enough.
The zillions of WP preg_match()’s happen on every URL I have checked, not just mismatched URL’s with problems. They only stop if a match is found. The list of regexs is in $wp_rewrite->rules, and seems to include five regexes for every page on the site! Here is how the list starts:
category/(.+?)/feed/(feed|rdf|rss|rss2|atom)/?$ category/(.+?)/(feed|rdf|rss|rss2|atom)/?$ category/(.+?)/page/?([0-9]{1,})/?$ category/(.+?)/?$ ... robots\.txt$ ... .*wp-feed.php$ .*wp-commentsrss2.php$
which is reasonable enough. But then every page on the site has these five, which are executed on every URL!
(important-event/press)/trackback/?$ (important-event/press)/feed/(feed|rdf|rss|rss2|atom)/?$ (important-event/press)/(feed|rdf|rss|rss2|atom)/?$ (important-event/press)/page/?([0-9]{1,})/?$ (important-event/press)/comment-page-([0-9]{1,})/?$ (important-event/press)(/[0-9]+)?/?$
Any attachment on the page doubles the number of its regexes in the list.
This is crazy tunes and anyone could see how to short circuit it. Anyway, I decided not to try to hook in early. Since every page lookup is doing this stuff, there’s not much point in avoiding it just for my rewrites.
So I am using the two hooks ‘404_template’ and ‘redirect_canonical’.
Next steps:
Test out WP 3.3 release candidate to see if the code has improved.
Release the code if anybody wants it.Seems like rewriting mixed-case URL’s (correctly) would have already be a plugin, but my search didn’t find it.
- The topic ‘[Plugin: Legacy URL Forwarding] Lowercase permalinks’ is closed to new replies.