Hi @mgearh ,
Thank you for your words! I appreciate it, and also prefer micro plugins that solve specific problems.
I’m considering something more robust to handle huge websites like yours, would you be available to test it if I add a improvement in the next version?
I’m also planning to save the vectors outside, in a different table or an external service. However, using the default solution currently yields the performance we can achieve in this initial version.
Thank you!
Samuel
Thread Starter
mgearh
(@mgearh)
For 21k posts, we will definitely want to use something like pinecone to store all of the embedded data. Our site is very busy and we don’t want to overtax the MariaDB. We are very willing to help test new versions but not until and embedding onboarding process for posts, CPTs, etc. and an external DB function is available. Thanks for your fast reply!
Just FYI, recently added “AI Search” and paying attention to the same limitation on a WordPress site with 9K posts. Looking for a simple search plugin to improve on non-AI plugins and this looks interesting.
@richc @mgearh hello! Hope you both are well.
I’ve just sent a new version of the plugin, where you can find more options in the Embeddings Generator tab. I did run some tests using 600 posts at the same time and worked correctly.
There is an opportunity here to create a better Import tool, but for now we’re focused in adding other AI providers and/or create our own service to be used by our users.
Let me know your thoughts!
-
This reply was modified 7 months, 2 weeks ago by
Samuel Silva.
easiest way would be adding a boost mode setting enabled by default upon activation that adds a cron every 1 min to process posts without embedding, if none left then dump the cron.
other solution is that people reading this ask at least o3 to write a php function to run in wordpress root (if they’re unfamiliar with api) to generate embeddings and update post meta author uses.
Thread Starter
mgearh
(@mgearh)
Thanks for the update @samuelsilvapt This is helpful. But it does not solve my core problem for semantic search and that is ranking the final output past the chunk retrieval. I would rather show more recent content with less alignment than ancient content with full alignment.