Support » Requests and Feedback » Serious UTF-8 related issue in wp_html_excerpt() function

  • Few hours ago, I changed my blog’s theme into P2. And few minutes ago, I have noticed that P2’s own Recent Comments does not processed UTF-8 strings correctly. See this captured image and you’ll find the Replacement character: �. So I traced function calls, and found that the problem occurrence was initiated in wp_html_excerpt() function.

    Inside the function, mb_substr() is used to slice the string into given size. Just like this: $str = mb_substr( $str, 0, $count );.
    My other PHP applications also use mb_substr(), but one thing is different: I always specify encoding parameter.
    So I added the parameter: $str = mb_substr( $str, 0, $count, 'UTF-8' );. After this, all the things are green.

    I don’t know why WP developers omitted the parameter, but adding it also repairs this Permalink section underneath the title field in ‘Edit Post’ page. Usually I don’t touch WP built-in functions, but this is serious issue (because this time, unlike the permalink section in admin page, the broken characters are visible to public) and unwillingly I had to modify the function.
    I hope to see this issue solved in next version.

    I found backward compatibility code from /wp-includes/compat.php. Now I see why encoding parameter got omitted. _mb_substr() function processes only UTF-8. But I recommend to add the parameter in case of real mb_substr() exists. Real mb_substr() make some kind of strange behavior in some environments, as I described above.
    Also, don’t forget to add the parameter on mb_strlen(), because it affects permalink abridgement on Permalink section in ‘Edit Post’ page.

Viewing 1 replies (of 1 total)
Viewing 1 replies (of 1 total)
  • The topic ‘Serious UTF-8 related issue in wp_html_excerpt() function’ is closed to new replies.