Viewing 3 replies - 1 through 3 (of 3 total)
  • URIs use an 82 character subset of US ASCII as defined in RFC 2396. There are schemes for encoding i18n URIs. To assure uniform encoding, UTF-8 is dictated. This isn’t too difficult for someone who is operating their blog in UTF-8. Simply replace all non RFC 2396 characters with their UTF-8 hex value. If the blog is not operating in UTF-8, we get into the mess of juggling encodings. This is why we added code to CVS to simply use the post id if sanitize_title returns an empty string. If someone wants to write a sanitize_title plugin that performs i18n URI encoding, go for it. Take a look at seems_utf8() and remove_accents() for some hints. If the plugin is general purpose enough to handle all scenarios, we could incorporate it into WordPress. Anyone want to have a go?

    Thread Starter pankaj

    (@pankaj)

    Thanks Ryan for providing an informative explanation. Wish I could do this..
    In the interim you said “code to use the post id if sanitize_title returns an empty string is in CVS”. Does that mean it would in the coming nightly?
    Thanks
    PAnkaj

    Yes, in the nightly. We do this for posts and categories. I still need to fix authors. We might make the category and author “nice names” editable in the manner of the post slug.

Viewing 3 replies - 1 through 3 (of 3 total)
  • The topic ‘Permalinks in Double Byte languages’ is closed to new replies.