WordPress.org

Forums

Relevanssi - A Better Search
Bug (and Fix): Stopwords table index (4 posts)

  1. Alexander Gieg
    Member
    Posted 2 years ago #

    I've noticed a serious problem with the stopwords functionality, in that adding words that differed only on accented characters (for example, in Portuguese, 'pode' and 'pôde') results in only one of them actually entering the database, the other one being considered a duplicate, what clearly isn't the case. Researching the problem, I noticed it comes down to Relevanssi applying WordPress' generic collation (in my case, utf8_unicode_ci) to the UNIQUE-indexed stopword column in the wp_relevanssi_stopwords table. This causes MySQL to consider those two different words as being the same, and thus to only allow one of them.

    The solution was to manually change the column collation to utf8_bin with an ALTER TABLE 'wp_relevanssi_stopwords' CHANGE 'stopword' 'stopword' VARCHAR(50) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL, thus preserving the UNIQUE restriction in the index while allowing for words that only differ in their accented letters.

    My suggestion then is to change the code that creates this table so that its collation becomes whatever_bin. This is safe because all MySQL collations have a _bin version.

    Here's the pastebin with my proposed changes to the relevanssi_install() function.

    I hope this helps!

    PS: Two other tables have indexes in the same situation, so my guess is that it's safe to do the same to them. The above fix does this too.

    By the way: it'd be nice to have some way to update these tables for existing installations. Maybe a button in the interface somewhere to manually apply three ALTER TABLE?

    http://wordpress.org/extend/plugins/relevanssi/

  2. Alexander Gieg
    Member
    Posted 2 years ago #

    Also, a suggestion: here's a small improvement to the relevanssi_populate_stopwords() function. With it, if non-English stopwords files were to be renamed after the WPLANG constant fashion, i.e.:

    stopwords.finnish -> stopwords.fi
    stopwords.french -> stopwords.fr
    stopwords.german -> stopwords.de
    stopwords.polish -> stopwords.pl
    stopwords.portuguese -> stopwords.pt
    stopwords.spanish -> stopwords.es

    They would be loaded automatically following the language of the current WordPress install upon Relevanssi install, so without the need for the user to rename the file manually.

  3. Mikko Saari
    Member
    Plugin Author

    Posted 2 years ago #

    Thanks. I have database versioning in the plugin, so I can make the new version update existing tables.

  4. Mikko Saari
    Member
    Plugin Author

    Posted 2 years ago #

    This was a bit more complicated, because at least my WP installation doesn't specify a collation. I've made it so that if the user is using utf8, the collation is set to utf8_bin, otherwise the user will be out of luck. Then again, most problem cases will probably have collation set in wp-config.php.

Topic Closed

This topic has been closed to new replies.

About this Plugin

About this Topic

Tags