Few hours ago, I changed my blog’s theme into P2. And few minutes ago, I have noticed that P2’s own Recent Comments does not processed UTF-8 strings correctly. See this captured image and you’ll find the Replacement character: �. So I traced function calls, and found that the problem occurrence was initiated in
Inside the function, mb_substr() is used to slice the string into given size. Just like this:
$str = mb_substr( $str, 0, $count );.
My other PHP applications also use
mb_substr(), but one thing is different: I always specify encoding parameter.
So I added the parameter:
$str = mb_substr( $str, 0, $count, 'UTF-8' );. After this, all the things are green.
I don’t know why WP developers omitted the parameter, but adding it also repairs this Permalink section underneath the title field in ‘Edit Post’ page. Usually I don’t touch WP built-in functions, but this is serious issue (because this time, unlike the permalink section in admin page, the broken characters are visible to public) and unwillingly I had to modify the function.
I hope to see this issue solved in next version.
I found backward compatibility code from /wp-includes/compat.php. Now I see why encoding parameter got omitted.
_mb_substr()function processes only UTF-8. But I recommend to add the parameter in case of real
mb_substr()make some kind of strange behavior in some environments, as I described above.
Also, don’t forget to add the parameter on mb_strlen(), because it affects permalink abridgement on Permalink section in ‘Edit Post’ page.
- The topic ‘Serious UTF-8 related issue in wp_html_excerpt() function’ is closed to new replies.