WordPress.org

Ready to get started?Download WordPress

Forums

Simple Tags
Automatic tagging does not work with multibyte charsets (e.g., UTF-8) (1 post)

  1. Vladimir Kolesnikov
    Member
    Posted 3 years ago #

    The problem is that the regular expression used to extract tags from the post:

    $match = "/\b" . preg_quote($term_name, "/") . "\b/".$case;

    \b — word boundary — matches non-letter/letter or letter/non-letter combinations. The problem is that in the default locale cyrillic letters are not considered letters and therefore the plugin fails to match cyrillic tags.

    The check is easy:

    php -r 'setlocale(LC_ALL, "ru_RU.UTF-8"); echo (int)preg_match("/\bспорт\b/", "это спорт"), "\n";'

    will display 0 when run.

    Here's the patch that fixes the problem:

    diff -uwdBrN simple-tags.orig/inc/client.php simple-tags/inc/client.php
    --- simple-tags.orig/inc/client.php	2010-08-11 10:37:39.000000000 +0300
    +++ simple-tags/inc/client.php	2010-09-13 06:20:40.000000000 +0300
    @@ -157,8 +157,9 @@
     				}
    
     				$filtered = ''; // will filter text token by token
    -				$match = "/\b" . preg_quote($term_name, "/") . "\b/".$case;
    -				$substitute = '<a href="'.$term_link.'" class="st_tag internal_tag" '.$rel.' title="'. esc_attr( sprintf( __('Posts tagged with %s', 'simpletags'), $term_name ) )."\">$0</a>";
    +				$quoted = preg_quote($term_name, "/");
    +				$match = '/(\PL|\A)(' . $quoted . ')(\PL|\Z)/u'.$case;
    +				$substitute = '$1<a href="'.$term_link.'" class="st_tag internal_tag" '.$rel.' title="'. esc_attr( sprintf( __('Posts tagged with %s', 'simpletags'), $term_name ) )."\">$2</a>$3";
    
     				// for efficiency only tokenize if forced to do so
     				if ( $must_tokenize ) {

    HOWEVER, it works only if PCRE used by PHP is compiled with UTF-8 support.

    http://wordpress.org/extend/plugins/simple-tags/

Topic Closed

This topic has been closed to new replies.

About this Plugin

About this Topic

  • RSS feed for this topic
  • Started 3 years ago by Vladimir Kolesnikov
  • This topic is not a support question
  • WordPress version: 3.0.1