Problem with relevanssi_create_excerpt and unicode ?

leup
(@leup)

10 years, 10 months ago

Hi,

I don’t know why but I have a problem into the function ‘relevanssi_create_excerpt’.

(FIY, I’m calling it into my code to create an excerpt for some post metas.)

I have a text that is return with some “?” characters. I was able ot fix the problem by replacing

$content = preg_replace('/\s+/', ' ', $content);

by

$content = preg_replace('/\s+/u', ' ', $content);

What’s weird is that sometimes it works even without the /u modifier.

I suspect that there is a php bug here but not really sure as sometimes it happens, sometimes not. With the exact same text !

As all my texts are Unicode (UTF8) encoded I can go with the /u modifier but somehow that don’t seem ok to me as the behavior is so weird.

My Setup is WordPress 4.4.2 and relevanssi 3.4.2

PHP 5.5.12
Apache 2.4.9
MySQL 5.6.17

https://wordpress.org/plugins/relevanssi/

Viewing 7 replies - 1 through 7 (of 7 total)

Thread Starter leup
(@leup)

10 years, 10 months ago

In the same function, I have another problem.

There is this line (234~):
$term = " $term";

I do understand that you are searching for words and not parts of words (not fuzzy) but there is a problem here with words with an apostroph.

Example: query => “afrique”. If the text is “L’afrique”, the excerpt will fail on finding the term ” afrique”.

Also, if I understand this correctly, if the function “mb_stripos” do not exists you do :

$titlecased = mb_strtoupper(mb_substr($term, 0, 1)) . mb_substr($term, 1);

and as the term always start with a blank space it fails to search the term with a first uppercase character.

Plugin Author Mikko Saari
(@msaari)

10 years, 10 months ago

Yes, that’s a bug with the first uppercase character. Also, adding the space – that’s a bit complicated as well, as it makes sense in some situations and not so much in other.

I think adding the /u modifier makes sense, since WP content is pretty much always UTF8. I’ll have to see about the added space – something needs to be done with that, I’m just not quite sure what.

In general the whole excerpt-building is far from being the most brilliant bit of programming in Relevanssi =)

Thread Starter leup
(@leup)

10 years, 10 months ago

Hi ! Thanks for your answer ! 🙂

I removed the leading space character as it suits my needs better and added the \u modifier.

I understand why you add the leading space but indeed it is far from perfect for every cases. Maybe using some regular expressions may be best ? Well, it would not give you the position of the occurence into the text… complex indeed. I will check what solution exists on the internet ^^

Thread Starter leup
(@leup)

10 years, 10 months ago

I made a quick search

Google

I think these links could be useful

Drupal 7

Stackoverflow

WordPress plugin for search excerpts

Plugin Author Mikko Saari
(@msaari)

10 years, 10 months ago

Thanks, those should help.

Plugin Author Mikko Saari
(@msaari)

10 years, 9 months ago

Leup, I’m working on a better excerpt-building mechanism. If you’re interested in testing it, please drop me an email at mikko @ mikkosaari.fi.

Thread Starter leup
(@leup)

10 years, 7 months ago

Hi Mikko,

Sorry for the delay. It would be definitely interesting but I have not so much time right now to do some tests.

Viewing 7 replies - 1 through 7 (of 7 total)

The topic ‘Problem with relevanssi_create_excerpt and unicode ?’ is closed to new replies.