WordPress.org

Ready to get started?Download WordPress

Forums

[Plugin: Relevanssi] utf-8 support (30 posts)

  1. levani01
    Member
    Posted 5 years ago #

    I liked this plugin very much but infortunetelly it doesn't support utf-8 content... It shows ???????? symbols instead of text. Would be very glad to see this issue fixed in next releases.

  2. Mikko Saari
    Member
    Posted 5 years ago #

    Actually, I've only tried this plugin with UTF-8 content, so the problem is somewhere else. The plugin uses multibyte string operations and so on.

    Please give me more details: where does it show the symbols, what settings are you using and so on.

  3. mpiftex
    Member
    Posted 5 years ago #

    I'm having a different problem but might be related to encoding too.
    I have a blog on which the posts are in greek (which means no latin characters). Every time I do a search for a greek word, there are no results. If I search for an english word contained into the posts, I'm getting results.
    I deactivated the plugin and results are found again on greek words.

    Any ideas?

  4. Mikko Saari
    Member
    Posted 5 years ago #

    Does Relevanssi index your content properly? For example, if you check the "25 most common words" list on the plugin settings page, does it list Greek words?

  5. Mikko Saari
    Member
    Posted 5 years ago #

    I did a quick test and Relevanssi was able to index and search Greek text without problems.

  6. mpiftex
    Member
    Posted 5 years ago #

    Thanks for the reply msaari. It's weird that it works for you!
    To answer you question, no all the indexed common words are english words.

    I don't know if it makes any difference but I have the latest WP version (2.8.2), the db encoding is utf8_general_ci and I'm running it locally, using XAMPP.

    I was taking a look at the plugin tables and I noticed that in wp_relevanssi, under "terms" the vast majority of entries is empty. Is that normal?

  7. Mikko Saari
    Member
    Posted 5 years ago #

    No, that is not normal... For some reason the Greek terms aren't making it to the index. When I look at the MySQL table in my test case, the Greek terms appear as question marks.

  8. mpiftex
    Member
    Posted 5 years ago #

    I created a new blog (v2.8.2, utf8_general_ci) out of the box: default theme, no other plugin. Added a couple of posts in greek, installed Relevanssi and the same thing happens. No greek words indexed or found during search..

    Too bad, because it seems to be a very helpful plugin (tested with english terms)! Back on the quest for finding a good search plugin then...

    Thanks for your help msaari. If you can figure out what's wrong and maybe have a fix in a latter version it will be great and I would be more than happy to give it another try! If you have any ideas for a quick fix or something, I'll be checking back this topic. Thanks for your time.

  9. Mikko Saari
    Member
    Posted 5 years ago #

    I repeated the same experiment, and everything works just as it should. WP 2.8.2, MySQL 4.1.22, utf8_general_ci.

    One thing comes to mind, though - when I test this, I copy-paste Greek text from another website (for example from this Lorem Ipsum page). Does that make any difference?

    Another thing you can try: on line 645 you can find the query that inserts the terms in the index. You could try echoing that query instead of feeding it to wpdb->query, to see if the terms are present. If they are, the problem is definitely with MySQL.

    That's my guess, anyway - something in your database setup that doesn't co-operate. As long as I'm unable to reproduce the problem, I can't help much more.

  10. mpiftex
    Member
    Posted 5 years ago #

    This is really weird. First of all I changed line 645 as you said and all the greek words appear as questionmarks: ????. Same result in both blogs.

    I repeated the experiment, new blog, same version and encoding and created a few posts. Installed Relevanssi and now for every greek word I search for, all the posts are returned as results, even if the word does not exist in the post. Even if the "word" I search for is a random combination of 20 characters.

    Now back to my first blog, which has real posts. After deleting the plugin and the tables in the db and reinstalling it, again no results are returned for any search in greek. Keeps working as it should when an english word is used.

    In both blogs the posts are a combination of real posts that I have written and dummy test from the same site that you got your text. I don't think it makes a difference though. I played around with the db collation and the respective options in wp-config.php with no luck.

    I understand that since you can't see the problem you can't just guess what it might be but I really appreciate you trying!

  11. Mikko Saari
    Member
    Posted 5 years ago #

    I think the question marks are ok, since that's how the Greek characters appear in the db in my blog where everything works. If the database still shows nothing, then I'm fairly sure the problem is with the database.

    Assuming it's a db problem, you could try searching for MySQL support. With quick googling, I found something:

    mySQL database problem with greek.
    some MySQL bug involving Greek, couldn't check it closer because the server is down

  12. smurkas
    Member
    Posted 4 years ago #

    I am having the same issues when using Swedish letters like å ä ö. Whenever a search word contains one of the above characters Relevanssi seems to cut off the search word at that letter. So if I search on lerägare (not a real word) Relevanssi searches on ler.

    Also the index seems to cut off the files at the special character, only the remainder of the word gets stored.

    å ä ö are stored as real characters in the database and all the tables are utf8.

    When do you think the strip occurs? Do you think it goes wrong when Relevanssi hits the database or before that?

    Kindly, Marcus.

  13. smurkas
    Member
    Posted 4 years ago #

    Also if I search on only Swedish letters I get mb_strpos() [function.mb-strpos]: Unknown encoding or conversion error. in relevanssi.php on line 756.

    The page character set is set to UTF-8 as well.

    Kindly, Marcus.

  14. smurkas
    Member
    Posted 4 years ago #

    The issue seems to be the three occurrences of preg_replace() in the code. Preg_replace is not utf-8 safe so it should be switched to mb_ereg_replace() instead.

    When I switch over the search works properly for me. I get weird characters in the database now instead of cut off words. Don't know if this impacts the plugin overall in some way.

    Kindly, Marcus.

  15. smurkas
    Member
    Posted 4 years ago #

    Ok there seems to be quite alot that needs to be replaced in order for Relevanssi to work properly with utf8 content all the way.

    Since I really need something like this I will try to fix everything utf8 related, hopefully I can do it!

    strtolower() in relevanssi_tokenize() ruins utf-8 characters as well. I will try to follow your search function step by step and fix stuff along the way. Hopefully the search function will get a hit on the search term I'm using when I have gone through it since the word is in the posts in my database.

    Kindly, Marcus

  16. Mikko Saari
    Member
    Posted 4 years ago #

    Interesting. I can search for words with äöå in them without any problems at all! Same with the Greek letters... So something funny there, that's somehow dependent on server setup or something like that. As for Swedish (and Finnish) alphabet showing up funny in the database, that's curious too, because while the Greek stuff is all question marks when I look at the databases, äöå is always just correct.

    I know the code is probably a bit patchy there - I did figure out I needed multibyte support when I got nasty results as multibyte characters where cut in two, but I admit I'm not a pro on the topic. Hence the use of preg_replace(), for example.

    Apparently strtolower() should be replaced with mb_convert_case().

    Once you've done with the script, send your version to mikko at mikkosaari.fi and I'll see what you've done. It would really help debugging if I could setup a test system that doesn't work just the way it should be =)

  17. Mikko Saari
    Member
    Posted 4 years ago #

    All you people with UTF-8 problems, check out the new 1.5 version with improved UTF-8 support, thanks to smurkas here.

  18. lorr
    Member
    Posted 4 years ago #

    Hi guys,

    first of all msaari, your plugin is simply ingenious! WP default search simply sucks, so Relevanssi is a must have! Thank you.

    Now, onto my issue which is very similar of smurkas' one. I am using the most recent version of Relevanssi, still I'm facing this "trimming" problem -- but with only 2 characters: 'á' and 'í' (UTF: á and í) which is quite weird 'cause handling e.g. 'ű' (ű) is just fine. And this 'ű' is quite special Hungarian character... so I don't get it.

    I also checked the DB and all words are stored correctly BUT those ones that contains 'á' or 'í'... trimmed, consequently the search result for these words are gibberish.

    Any suggestion?

    Thanks,
    L

    P.S. to complicate things even further, under the "25 most popular queries" section all the words with 'á' and/or 'í' are shown flawlessly...

  19. Mikko Saari
    Member
    Posted 4 years ago #

    Cryptic. I'd expect trouble with ű, not with perfectly usual á and í... I was able to repeat the problem with my test blog, so perhaps I can come up with a solution. I'll have to see what I can do.

  20. levani01
    Member
    Posted 4 years ago #

    You say that utf8 problems have been fixed but I'm not satisfied with the results at all! I checked the database and and noticed that in many cases it stores only a part of the word or even only a letter from the word. Is it what it's suppose to do? Very often if I try to search with two words it doesn't find anything, even in case of exact match!

    p.s. I use Georgian unicode.

  21. Mikko Saari
    Member
    Posted 4 years ago #

    Well, apparently there are still some problems. I must say I have no clue what's going on... for some reason á and í just don't work. When I look at my database through phpMyAdmin, the characters appear just fine, though the Hungarian u is a ?.

    So, I'm afraid this problem exceeds my understanding. If someone else wants to give it a go, feel free.

  22. levani01
    Member
    Posted 4 years ago #

    Unfortunatelly I'm not so familiar with programming to find out the exact reason. I don't know whether it's something too difficult to do or plugin developers don't pay attention to this, but it's a common problem to every advanced search plugin.

  23. Mikko Saari
    Member
    Posted 4 years ago #

    I'd guess many US developers aren't really aware of the problems. Since Finnish needs few additional characters, I'm aware of the issues - but my skills aren't enough to fix them...

  24. smurkas
    Member
    Posted 4 years ago #

    Hey again all.

    I'll have another go at it if you want Mikko to see if we can get this fixed. Hopefully I'll have some time to do a run on it in a couple of days, I'll send you an email with my progress.

  25. Adam W. Warner
    Member
    Posted 4 years ago #

    I just installed this plugin and when trying to index for the first time, I received this error:

    Call to undefined function: mb_internal_encoding() in /home/content/a/a/a/myuser/html/wp-content/plugins/relevanssi/relevanssi.php on line 1203

    Line 1203 is this:

    mb_internal_encoding("UTF-8");

    Can anyone offer any suggestions on why this is happening? I am currently deactivating/reactivating all my plugins and testing but I doubt there will be an affect.

  26. Mikko Saari
    Member
    Posted 4 years ago #

    It would seem to me you don't have multibyte string support enabled in your PHP. You could try commenting that line out (put // before it).

  27. modaser
    Member
    Posted 4 years ago #

    i have same problem most common words in the index show ????

    how to fix it?

    my language is Persian , Arabic

    thank you

  28. Mikko Saari
    Member
    Posted 4 years ago #

    Does the search work in general?

    I'm unfortunately unable to help here, as I can't really test properly with Arabian or Persian text.

  29. bobiasg
    Member
    Posted 4 years ago #

    Hi All,

    In function relevanssi_remove_punct, statement "$a = mb_ereg_replace(':punct:+', ' ', $a);" is error. I change it to "$a = mb_ereg_replace('/:punct:+/u', ' ', $a);"

    Ex: when $str = "khương", relevanssi_remove_punct will be return "khư� ng" instead "khương".

    Thanks

  30. Mikko Saari
    Member
    Posted 4 years ago #

    Thanks, will fix this in the next version.

Topic Closed

This topic has been closed to new replies.

About this Topic

Tags