Hi,
I want to suggest three enhancements.
1. The current version 1.0 breaks words in the middle and it is using strlen() which is not multibyte aware.
Test it with a description with an uneven amount of bytes like:
1öööööööööööööööööööööööööööööööööööööööööö
The result will contain invalid UTF-8.
2. Languages like Chinese don’t have white space (or just rarely). In all other languages, I would rather not cut off words and break at the last position of a white space. Otherwise you may end up with one character from the last word, which isn’t very useful.
3. The last two characters should be a non breaking white space and a real ellipsis (…), not just three dots (...).
Details matter. :)
So I made some changes to the function taxonomy_short_description_shorten():
function taxonomy_short_description_shorten( $string, $length = 23, $append = '…' ) {
$string = strip_tags( $string );
$string = trim( $string );
$string = html_entity_decode( $string, ENT_QUOTES, 'UTF-8' );
$string = rtrim( $string, '-' );
// toscho edit
if ( ! function_exists( 'mb_substr' ) )
{// original return call
return ( strlen( $string ) > absint( $length ) )
? substr_replace( $string, $append, absint( $length ) ) : $string;
}
// enhancements
// count the real characters
$s_length = strlen( utf8_decode( $string ) );
if ( $s_length <= $length )
{
return $string;
}
// shorten the string to max-length
$string = mb_substr( $string, 0, $length, 'utf-8' );
// avoid breaks within words
// find the last white space
$pos = mb_strrpos( $string, ' ', 'utf-8' );
// No space? One long word. Or chinese/korean/japanese text.
if ( $pos !== FALSE )
{
// shorten the string to the last space
$string = mb_substr( $string, 0, $pos, 'utf-8' )
// no break space, verbose notation for readability.
// plus a real ellipsis
. "\xC2\xA0" . $append;
}
return $string;
}
Regards
Thomas Scholz
http://wordpress.org/extend/plugins/taxonomy-short-description/