Yeah, sounds like this is caused by too much content. I haven’t tried this much data ever, so I don’t know what’s the exact mechanism here. What kind of searches time out? Few search terms, many search terms, common search terms?
I created a 80 000 word post on my test site, and Relevanssi does find that pretty much as fast as any shorter post.
Can you install the Query Monitor plugin and let me know what that reports – it will show you which are the slow queries. That would shed some light on this and maybe hint at a solution.
Dear Mikko,
thank you very much for your quick response 🙂
I just tried it with one-word search terms. It appears to make a difference whether the search term is common; the QueryMonitor report below comes from the search term “asd” (nonsense), whereas a search for “technology” results in an internal server error (probably timeout). I know that there is a blog post which should be found with the search term “technology” (a long one), so the commonness of a search term might make a difference.
I got two reports of queries reported to be slow by QueryMonitor; their indicated “time” is however not the real time of the search query, which took about 30 seconds.
Does this give you additional insights? 🙂
Query 1:
SELECT DISTINCT(relevanssi.doc), relevanssi.*, relevanssi.title * 10 + relevanssi.content + relevanssi.comment * 0.75 + relevanssi.tag * 5 + relevanssi.link * 0 + relevanssi.author + relevanssi.category * 3 + relevanssi.excerpt + relevanssi.taxonomy + relevanssi.customfield + relevanssi.mysqlcolumn AS tf
FROM wp_relevanssi AS relevanssi
WHERE (term LIKE '%asd'
OR term LIKE 'asd%')
AND ((relevanssi.doc IN (SELECT DISTINCT(posts.ID)
FROM wp_posts AS posts
WHERE posts.post_type NOT IN ('revision', 'nav_menu_item', 'custom_css', 'customize_changeset', 'feedzy_categories', 'mt_pp')))
OR (doc = -1))
ORDER BY tf DESC
LIMIT 500
Caller: relevanssi_search()
wp-content/plugins/relevanssi/lib/search.php:513
Time: 0.0911
Query 2:
SELECT COUNT(DISTINCT(relevanssi.doc))
FROM wp_relevanssi AS relevanssi
WHERE (term LIKE '%asd'
OR term LIKE 'asd%')
AND ((relevanssi.doc IN (SELECT DISTINCT(posts.ID)
FROM wp_posts AS posts
WHERE posts.post_type NOT IN ('revision', 'nav_menu_item', 'custom_css', 'customize_changeset', 'feedzy_categories', 'mt_pp')))
OR (doc = -1))
Caller: relevanssi_search()
wp-content/plugins/relevanssi/lib/search.php:550
Time: 0.1447
Additional information: Besides the long text in my posts, I include PDFs in an iframe in every post. The reason is that I would like to have the PDFs displayed in the post, but at the same time have them searchable (thus, the extracted PDF text is in a hidden div, which can be searched by Relevanssi).
Maybe the inclusion of the PDFs in an iframe could also be an issue which causes the Relevanssi search to be so slow?
A typical post looks like:
<div class ="hidden"> --- PDF extracted text --- </div>
<div id="pdfviewer">
<iframe class="pdf_document" src ="/wp-content/pdf.js/web/viewer.html?file=URL.pdf">
</iframe>
</div>
-
This reply was modified 8 years, 6 months ago by
requin1989.
-
This reply was modified 8 years, 6 months ago by
requin1989.
So clearly the problem is not in the database queries. If the queries take less than 0.3 seconds, they’re not the reason for 30-second search times. Searching in itself shouldn’t be very slow, and shouldn’t really be affected by long posts; it doesn’t matter in the database how long the posts are.
Are you using custom excerpts? If you are, that’s probably the reason for the timeout. Creating custom excerpts from long posts can take a lot of time. Does the problem go away if you disable custom excerpts?
Also, I’m guessing you have the excerpt length defined in characters. Switching to counting words is probably enough to solve the issue.
I created a post that’s over 200 000 words long. Creating a 300-character excerpt took 37 seconds, while creating a 30-word excerpt from the same post was done in less than 2 seconds.
Dear Mikko,
indeed, this seems to be the issue! Disabling custom excerpts makes the search considerably faster. Unfortunately, it was already fixed to “30 words” before, so that I have to completely deactivate it.
Is there maybe a kind of workaround to make it somehow work? These custom excerpts are a really comfortable function… would be sad to have to do without it.
In any case, thank you already for your help! 🙂
Requin
Using words is much faster than using characters, but as I found out in my tests, it can still take 1.5 seconds to create a 30-word excerpt from a 200.000-word post. Not a problem if that’s the only result, but if you have 30 posts like that, you’re still looking at 45 seconds. It’s not 18 minutes like it would be with 300-character excerpts, but it’s still too much.
If you want to help me a bit, here’s some improved code: https://gist.github.com/msaari/a6d97668cadcebcc80a90f8cd843868d
This is a replacement for the relevanssi_create_excerpt() function that should create excerpts faster. It may slightly reduce the quality of excerpts, but if it makes excerpts possible, then that’s a bonus, right?
Hi Mikko,
I was just about to post, but then I saw you replied first.
I solved it now by trimming the post content to about 20.000 characters. This should be enough to capture the table of content with relevant keywords to make the documents findable in the search.
Nevertheless, I’ll try your improved code to make it even faster! I’ll tell you if there are any issues with it.
Thank you for your great support! 🙂
Requin