I am building on this plugin to handle converting legacy sites with mixed-case permalinks to WordPress lowercase permalinks. The old site was Drupal, and I want to redirect the old mixed-case URL’s to lowercase without making 404’s. For most pages, WordPress looks up the slug case-insensitively and everything works. But I have found at least two cases where WP fails:
1. It keys on the last part of the slug:
2. It does not handle mapped domains well:
The approach of OllyBenson’s plugin Legacy URL Forwarding is to hook into the filter/action ‘404_template’. It’s a pretty simple 3-line plugin that allows editors to put the legacy URL in the page’s meta tags, which the the plugin looks up to make a redirect. I extended the plugin to always do a get_permalink() lookup of the lowercase of the slug, when not already lowercase. As you can see, I am trying to handle this problem without using mod_rewrite, that is, without changing .htaccess.
Extending OllyBenson’s plugin works for #2 above, but not #1, because that one never hits 404. Both the shorter and longer slugs exist. For #1, I use the filter hook ‘redirect_canonical’ to make the same substitution as in ‘404_template’.
On the other hand, #2 needs ‘404_template’, because even if I set the right URL in ‘redirect_canonical’, WP still goes to 404. This seems like a bug.
So both hooks are needed.
The URL rewriting code in WP is complex. The latter steps are in “wp-includes/canonical.php”, which includes the ‘redirect_canonical’ hook. The function there even recurses once. But by this point, much URL processing has already happened. In fact, hundreds of preg_match()’s have been tried! So it would be more efficient to fix the URL earlier than “wp-includes/canonical.php”, but the code before that is even more complex, in “wp-includes/class-wp.php”. It would make sense to fix all this before reaching 404, but ‘redirect_canonical’ is not soon enough.
The zillions of WP preg_match()’s happen on every URL I have checked, not just mismatched URL’s with problems. They only stop if a match is found. The list of regexs is in $wp_rewrite->rules, and seems to include five regexes for every page on the site! Here is how the list starts:
which is reasonable enough. But then every page on the site has these five, which are executed on every URL!
Any attachment on the page doubles the number of its regexes in the list.
This is crazy tunes and anyone could see how to short circuit it. Anyway, I decided not to try to hook in early. Since every page lookup is doing this stuff, there’s not much point in avoiding it just for my rewrites.
So I am using the two hooks ‘404_template’ and ‘redirect_canonical’.
Test out WP 3.3 release candidate to see if the code has improved.
Release the code if anybody wants it.
Seems like rewriting mixed-case URL’s (correctly) would have already be a plugin, but my search didn’t find it.