Permalinks in Double Byte languages (4 posts)

  1. pankaj
    Posted 11 years ago #

    Hi All
    This feature has been missing since I started Hindi on wordpress. My guess is it would be same for other double byte languages. Could some Japanese or Chinese blogger relate to this. What I am talking about is the ability of WordPress to generate URI with the postname. It works in English probably in other ASCII based languages too. But it doesn't produce anything in Hindi blogs e.g. this can be seen on my latest dev blog at http://dev.pnarula.com/ag13. Permalinks show as
    instead of
    Hindi or devnaagri is permitted in the URLs as can be seen from this URL on dmoz
    One way I have gotten arround this problem is to use post_id in the URIs instead of postnames. But I think time has come to see this postname also working in WP.
    Help is much appreciated. Keep up the good work guys. Thanks to Michael the Kubrick Theme Dude for making available this theme which prompted me to ask this question.

  2. Ryan Boren
    WordPress Dev
    Posted 11 years ago #

    URIs use an 82 character subset of US ASCII as defined in RFC 2396. There are schemes for encoding i18n URIs. To assure uniform encoding, UTF-8 is dictated. This isn't too difficult for someone who is operating their blog in UTF-8. Simply replace all non RFC 2396 characters with their UTF-8 hex value. If the blog is not operating in UTF-8, we get into the mess of juggling encodings. This is why we added code to CVS to simply use the post id if sanitize_title returns an empty string. If someone wants to write a sanitize_title plugin that performs i18n URI encoding, go for it. Take a look at seems_utf8() and remove_accents() for some hints. If the plugin is general purpose enough to handle all scenarios, we could incorporate it into WordPress. Anyone want to have a go?

  3. pankaj
    Posted 11 years ago #

    Thanks Ryan for providing an informative explanation. Wish I could do this..
    In the interim you said "code to use the post id if sanitize_title returns an empty string is in CVS". Does that mean it would in the coming nightly?

  4. Ryan Boren
    WordPress Dev
    Posted 11 years ago #

    Yes, in the nightly. We do this for posts and categories. I still need to fix authors. We might make the category and author "nice names" editable in the manner of the post slug.

Topic Closed

This topic has been closed to new replies.

About this Topic


No tags yet.