Support » Plugin: Relevanssi - A Better Search » [Plugin: Relevanssi – A Better Search] Bug (and Fix): Stopwords table index

  • I’ve noticed a serious problem with the stopwords functionality, in that adding words that differed only on accented characters (for example, in Portuguese, ‘pode’ and ‘pôde’) results in only one of them actually entering the database, the other one being considered a duplicate, what clearly isn’t the case. Researching the problem, I noticed it comes down to Relevanssi applying WordPress’ generic collation (in my case, utf8_unicode_ci) to the UNIQUE-indexed stopword column in the wp_relevanssi_stopwords table. This causes MySQL to consider those two different words as being the same, and thus to only allow one of them.

    The solution was to manually change the column collation to utf8_bin with an ALTER TABLE 'wp_relevanssi_stopwords' CHANGE 'stopword' 'stopword' VARCHAR(50) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL, thus preserving the UNIQUE restriction in the index while allowing for words that only differ in their accented letters.

    My suggestion then is to change the code that creates this table so that its collation becomes whatever_bin. This is safe because all MySQL collations have a _bin version.

    Here’s the pastebin with my proposed changes to the relevanssi_install() function.

    I hope this helps!

    PS: Two other tables have indexes in the same situation, so my guess is that it’s safe to do the same to them. The above fix does this too.

    By the way: it’d be nice to have some way to update these tables for existing installations. Maybe a button in the interface somewhere to manually apply three ALTER TABLE?

Viewing 3 replies - 1 through 3 (of 3 total)
  • Also, a suggestion: here’s a small improvement to the relevanssi_populate_stopwords() function. With it, if non-English stopwords files were to be renamed after the WPLANG constant fashion, i.e.:

    stopwords.finnish ->
    stopwords.french ->
    stopwords.german ->
    stopwords.polish ->
    stopwords.portuguese ->
    stopwords.spanish ->

    They would be loaded automatically following the language of the current WordPress install upon Relevanssi install, so without the need for the user to rename the file manually.

    Plugin Author Mikko Saari


    Thanks. I have database versioning in the plugin, so I can make the new version update existing tables.

    Plugin Author Mikko Saari


    This was a bit more complicated, because at least my WP installation doesn’t specify a collation. I’ve made it so that if the user is using utf8, the collation is set to utf8_bin, otherwise the user will be out of luck. Then again, most problem cases will probably have collation set in wp-config.php.

Viewing 3 replies - 1 through 3 (of 3 total)
  • The topic ‘[Plugin: Relevanssi – A Better Search] Bug (and Fix): Stopwords table index’ is closed to new replies.