Support » Plugin: Relevanssi - A Better Search » How to remove combining diacritical marks?

  • I am working on a page to learn thai for german speaking users. The thai tone characters are displayed with combining diacritical marks:

    ̀x U+0300
    ̂x U+0302
    ́x U+0301
    ̌x U+030C

    x is a placeholder example only, x can be a, ä, e, i, o, ɔ, ö, u, ü.

    Because farangs usually have no idea on how to input tone marks to the search form or when tone marks have to be used or how they have to be used, i would like to treat all of them as same character on the search engine:

    Accents excerpt german umlauts (äöü) should be removed and ɔ = o

    a = à â á ǎ
    ä = ä̀ ä̂ ä́ ä̌
    e = è ê é ě
    i = ì î í ǐ
    o = ò ô ó ǒ ɔ ɔ̀ ɔ̂ ɔ́ ɔ̌
    ö = ö̀ ö̂ ö́ ö̌
    u = ù û ú ǔ
    ü = ǜ ü̂ ǘ ǚ

    It is working already only for single characters like ǜ or ǎ, but not for combining diacritical marks, because those letters are always 2 characters, for example u umlaut with circumflex: ü̂ = U+00fcU+0302

    I am using a modified replacement function at the moment:

    $replacement_arrays = apply_filters(
        'relevanssi_accents_replacement_arrays',
        array(
            'from' => array( 'a', 'c', 'e', 'i', 'o', 'u', 'n', "'" ),
            'to'   => array( '(a|á|à|â)', '(c|ç)', '(e|é|è|ê|ë)', '(i|í|ì|î|ï)', '(o|ó|ò|ô|õ)', '(u|ú|ù|ü|û)', '(n|ñ)', "('|’)?" ),
        )
    );

    but i have no idea on how to extend it for combining diacritical marks. All attempts were failed.

    Here is an example of what is working and what not:

    Search for “xax xäx xex xix xox xöx xux xüx” should find (and highlight) all words beginning with “x” on that test page. Red words with x are correct but the black ones are not found and not highlighted because the accent replacement function is wrong.

    https://thai.schule/?s=xax+x%C3%A4x+xex+xix+xox+x%C3%B6x+xux+x%C3%BCx

    https://thai.schule/test/?highlight=xax%20x%C3%A4x%20xex%20xix%20xox%20x%C3%B6x%20xux%20x%C3%BCx

    How can i achieve that?

    Thank you!

Viewing 1 replies (of 1 total)
  • Plugin Author Mikko Saari

    (@msaari)

    This seems to work:

    add_filter( 'relevanssi_accents_replacement_arrays', 'rlv_add_diacriticals' );
    function rlv_add_diacriticals( $array ) {
    	$array['from'][] = 'ä';
    	$array['to'][]   = '(ä|ä̀|ä̂|ä́|ä̌)';
    	return $array;
    }
Viewing 1 replies (of 1 total)
  • You must be logged in to reply to this topic.