WordPress.org

Ready to get started?Download WordPress

Forums

SEO Auto Linker
Regex pattern incorrect (4 posts)

  1. bouncesquad
    Member
    Posted 2 years ago #

    Problem: The plugin won't auto-link keywords that have question marks or other punctuation in them (specifically, at the beginning or end of the keyword). This may also cause problems with keywords containing non-ASCII (i.e. Unicode) characters.

    Cause: get_kw_regex() in inc/front.php uses an incorrect pattern. The pattern used is:


    return sprintf('/(\b)(%s)(\b)/ui', implode('|', $keywords));

    \b is a word boundary where a word character and a non-word character are adjacent, but "word" characters only include [A-Za-z0-9_]. So a keyword ending in "?" will never create a word boundary, because it's not a word character.

    Solution: Instead of using word boundaries, just use non-word characters:


    return sprintf('/(\W)(%s)(\W)/ui', implode('|', $keywords));

    I imagine this could be the cause of the reported Unicode problems too, since a keyword beginning or ending with a Unicode character would not create a word boundary either.

    http://wordpress.org/extend/plugins/seo-auto-linker/

  2. chrisguitarguy
    Member
    Plugin Author

    Posted 2 years ago #

    A word boundary is a position in the subject string where the current character and the previous character do not both match \w or \W (i.e. one matches \w and the other matches \W), or the start or end of the string if the first or last character matches \w, respectively.

    And

    A "word" character is any letter or digit or the underscore character, that is, any character which can be part of a Perl "word". The definition of letters and digits is controlled by PCRE's character tables, and may vary if locale-specific matching is taking place. For example, in the "fr" (French) locale, some character codes greater than 128 are used for accented letters, and these are matched by \w.

    It would seem that \b uses \w and \W? I also suspect that both \W and \b change when the unicode flag is specified?

    Source: http://php.net/manual/en/regexp.reference.escape.php

  3. bouncesquad
    Member
    Posted 2 years ago #

    I didn't test the Unicode theory, just punctuation. But \b, \B, \w, \W probably don't change when /u is used. See point 6 on this page:

    http://www.exim.org/viewvc/pcre/code/trunk/doc/html/pcreunicode.html?view=co

    That's the default behavior of PCRE even with Unicode enabled, but the PHP manual doesn't say exactly what /u changes.

  4. chrisguitarguy
    Member
    Plugin Author

    Posted 2 years ago #

    I just tested with a ? and other punctuation and it seems to work fine on my local server (nginx + php 5.3.10).

    I'm not sure. This unicode and regex stuff is the painful part of this plugin. :/

Topic Closed

This topic has been closed to new replies.

About this Plugin

About this Topic