Support » Plugin: Relevanssi - A Better Search » Highlighting of search results not working with replacement function for ß→ss

  • Resolved strontium90

    (@strontium90)


    I am using the replacement function to make ß and ss equivalent, but highlighting of search results is not working with that function enabled. I have extended it to substitute ɔ → o.

    How can i setup highlighting of search results for it?

    Thank you!

    • This topic was modified 2 months, 3 weeks ago by strontium90.
    • This topic was modified 2 months, 3 weeks ago by strontium90.
Viewing 9 replies - 1 through 9 (of 9 total)
  • Plugin Author Mikko Saari

    (@msaari)

    It’s not possible at the moment. I’ll look into this for the next version, perhaps I can improve this somehow. No promises though, because this is a difficult area.

    The ß to ss equivalence is built-in to Relevanssi, by the way, the function is not necessary anymore.

    The search result snippet is wrong as well in that case. It looks like that the snippet displays the first few words of the beginning of the page (or of the posting) instead of the text around the search term.

    🙁

    Plugin Author Mikko Saari

    (@msaari)

    Yes, the excerpt-builder looks for “ss” and doesn’t find “ß”. I’ll see what I can do to improve this in the next version. I have a solution partially done already, I just need to look and see if there are any downsides to it.

    If you want to give it a go, make sure you’re running Relevanssi 4.3.4 and then replace lib/excerpts-highlights.php with this file: https://github.com/msaari/relevanssi/blob/master/lib/excerpts-highlights.php

    I have copied the new excerpts-highlights.php to the lib folder, disabled the ß→ss function in the theme functions.php now and setup “Keyword matching = Whole words” in the Searching tab of Relevanssi 4.3.4 options page.

    With that, highlighting of words with ß in the document seems to work but the text snippet of the search result is still wrong. It still starts with the beginning of the page instead with the text around the search term.

    Text snippets of search result of search terms with thai letters also does not work. They are starting with the beginning of the page as well.

    • This reply was modified 2 months, 3 weeks ago by strontium90.
    Plugin Author Mikko Saari

    (@msaari)

    Can you show me an example? You’re 100% sure that you’re seeing Relevanssi excerpts (ie. excerpt length setting still works)?

    Relevanssi settings page
    Indexing: https://postimg.cc/R6Tv8Xtv
    Searching: https://postimg.cc/gwd55dH6
    Excerpts and highlights: https://postimg.cc/rRGwqfWn

    Highlighting OK, snippet not OK
    Search for “ißt” (means “is eating”)
    https://thai.schule/?s=i%C3%9Ft

    Highlighting not OK, snippet not OK
    Search for “isst” (means “is eating”)
    https://thai.schule/?s=isst

    Highlighting OK, snippet not OK
    Search for “ไม้ใหม่ไม่ไหม้มั้ย”
    https://thai.schule/?s=%E0%B9%84%E0%B8%A1%E0%B9%89%E0%B9%83%E0%B8%AB%E0%B8%A1%E0%B9%88%E0%B9%84%E0%B8%A1%E0%B9%88%E0%B9%84%E0%B8%AB%E0%B8%A1%E0%B9%89%E0%B8%A1%E0%B8%B1%E0%B9%89%E0%B8%A2
    BTW: This text snipped sould not be displayed because it has the class “relevanssi_noindex”: https://postimg.cc/ft9PFL0Z

    Highlighting OK, snippet OK
    Search for “nâagluua”
    https://thai.schule/?s=na%CC%82agluua
    Text snippet has 9 words, as setup on settings page

    • This reply was modified 2 months, 3 weeks ago by strontium90.
    • This reply was modified 2 months, 3 weeks ago by strontium90.
    • This reply was modified 2 months, 3 weeks ago by strontium90.
    Plugin Author Mikko Saari

    (@msaari)

    Ah, I got it. I made Relevanssi use the untokenized search terms for highlighting, but that doesn’t help alone – Relevanssi also needs to use them for excerpt-building, otherwise the excerpt won’t contain the words. Here’s another update for the lib/excerpts-highlights.php file, this should help: https://github.com/msaari/relevanssi/blob/master/lib/excerpts-highlights.php

    In order to make Thai highlighting work, you need to have the “Uncheck this if you use non-ASCII characters” option in the right setting.

    The correct setting should be unchecked, but looks like there’s a logic error in the Relevanssi code and the option is reversed. The new excerpts-highlights.php also includes a fix for that, so it may be the Thai highlights start to work with just the update, but if they don’t, try toggling that setting.

    “BTW: This text snipped sould not be displayed because it has the class “relevanssi_noindex”: ”

    No, that’s not what the class does. The only thing that class does is to block Relevanssi from indexing a Gutenberg block that has the class. It has no effect on building excerpts.

    Thank you @msaari, with the new excerpts-highlights.php all examples mentioned above are working except only one:

    Search for “isst”
    https://thai.schule/?s=isst
    It finds “isst” and “ißt”, but text snippet and highlighting are OK only for “isst” and not for “ißt”.

    “Uncheck this if you use non-ASCII characters” option is unchecked (see screenshot of config page of above posting), thai letters are working: text snippet and highlighting are OK. Even thai numbers are recognized as arabic numbers, but are not highlighted.

    Example: search for “๑๓” (means “13”):
    https://thai.schule/?s=%E0%B9%91%E0%B9%93
    Text snippets for “๑๓” and “13” are correct but highlighting is correct only for “๑๓” and not for “13”

    How can i easily turn on and off debugging of accent variations?
    Screenshot of enabled debugging feature: https://postimg.cc/1gsQhJf7

    With this debugging feature enabled i could find out that in the replacement function

    $replacement_arrays = apply_filters(
        'relevanssi_accents_replacement_arrays',
        array(
            'from' => array( 'a', 'c', 'e', 'i', 'o', 'u', 'n', "'" ),
            'to'   => array( '(a|á|à|â)', '(c|ç)', '(e|é|è|ê|ë)', '(i|í|ì|î|ï)', '(o|ó|ò|ô|õ)', '(u|ú|ù|ü|û)', '(n|ñ)', "('|’)?" ),
        )
    );

    some characters are missing:

    ǎ U+01CE
    ě U+011B
    ǐ U+01D0
    ǒ U+01D2
    ǔ U+01D4

    ü has his own replacements (U+01DC, U+01D8, U+01DA) and should not be replaced by u:

    ü → ü|ǜ|ǘ|ǚ

    Also, i would like to remove some combining diacritical marks which are used for romanised thai tone characters:

    ̀x U+0300
    ̂x U+0302
    ́x U+0301
    ̌x U+030C

    x is a placeholder example only, x can be a, ä, e, i, o, ɔ, ö, u, ü. See character map: https://thai.schule/zeichentabelle/

    • This reply was modified 2 months, 2 weeks ago by strontium90.
    • This reply was modified 2 months, 2 weeks ago by strontium90.
    • This reply was modified 2 months, 2 weeks ago by strontium90.
    • This reply was modified 2 months, 2 weeks ago by strontium90.
    Plugin Author Mikko Saari

    (@msaari)

    There’s no debugging feature, the code just had an extra var_dump() left in it. Either edit the file to remove the line with var_dump() on it, or get a new version of the file from Github.

    It’s not easy to make the search term “isst” to match the “ißt” in the document. If it were a 1-to-1 conversion, it’d be easy, but matching two letters to one letter is really difficult with the current setup, especially as it’s only in some cases where “ss” should match to “ß”. However, here’s one solution:

    add_filter( 'relevanssi_excerpt_query', 'rlv_ss_filter' );
    function rlv_ss_filter( $query ) {
    	return str_replace( 'ss', 'ß', $query );
    }

    This will automatically convert all “ss” in the search queries to “ß”, which leads to Relevanssi searching for both versions when creating excerpts.

    The accent variations list is for most common European language use cases, so I’m not going to add Thai accents to it (Relevanssi can’t really be used for longer Thai texts, because Thai texts don’t have spaces between words, so adding specific Thai support is pointless). However, because people have different needs, the accent variations can be filtered so you can modify them. It works like this:

    add_filter( 'relevanssi_accents_replacement_arrays', 'rlv_thai_accents' );
    function rlv_thai_accents( $arrays ) {
        $arrays['from'] = array( 'a', 'c', 'e', 'i', 'o', 'u', 'n', "'" );
        $arrays['to'] = array( '(a|á|à|â)', '(c|ç)', '(e|é|è|ê|ë)', '(i|í|ì|î|ï)', '(o|ó|ò|ô|õ)', '(u|ú|ù|ü|û)', '(n|ñ)', "('|’)?" );
        return $arrays;
    }

    You can modify the arrays as much as you want, you can remove the replacements you don’t need, and add new replacements. Just make sure the entries in the arrays match each other, so that the first item in the “from” array matches the first item in the “to” array.

Viewing 9 replies - 1 through 9 (of 9 total)
  • You must be logged in to reply to this topic.