The problem is that the regular expression used to extract tags from the post:
$match = "/\b" . preg_quote($term_name, "/") . "\b/".$case;
\b — word boundary — matches non-letter/letter or letter/non-letter combinations. The problem is that in the default locale cyrillic letters are not considered letters and therefore the plugin fails to match cyrillic tags.
The check is easy:
php -r 'setlocale(LC_ALL, "ru_RU.UTF-8"); echo (int)preg_match("/\bспорт\b/", "это спорт"), "\n";'
will display 0 when run.
Here's the patch that fixes the problem:
diff -uwdBrN simple-tags.orig/inc/client.php simple-tags/inc/client.php
--- simple-tags.orig/inc/client.php 2010-08-11 10:37:39.000000000 +0300
+++ simple-tags/inc/client.php 2010-09-13 06:20:40.000000000 +0300
@@ -157,8 +157,9 @@
}
$filtered = ''; // will filter text token by token
- $match = "/\b" . preg_quote($term_name, "/") . "\b/".$case;
- $substitute = '<a href="'.$term_link.'" class="st_tag internal_tag" '.$rel.' title="'. esc_attr( sprintf( __('Posts tagged with %s', 'simpletags'), $term_name ) )."\">$0</a>";
+ $quoted = preg_quote($term_name, "/");
+ $match = '/(\PL|\A)(' . $quoted . ')(\PL|\Z)/u'.$case;
+ $substitute = '$1<a href="'.$term_link.'" class="st_tag internal_tag" '.$rel.' title="'. esc_attr( sprintf( __('Posts tagged with %s', 'simpletags'), $term_name ) )."\">$2</a>$3";
// for efficiency only tokenize if forced to do so
if ( $must_tokenize ) {
HOWEVER, it works only if PCRE used by PHP is compiled with UTF-8 support.