Title: Override WordPress content import (custom preprocessing)
Last modified: May 17, 2026

---

# Override WordPress content import (custom preprocessing)

 *  Resolved [samymsa](https://wordpress.org/support/users/samymsa/)
 * (@samymsa)
 * [3 weeks, 1 day ago](https://wordpress.org/support/topic/override-wordpress-content-import-custom-preprocessing/)
 * Hi Maxwell,
 * I’m using mxchat with complex Elementor-built pages that include heavily nested
   structures (e.g. vc-rows, shortcodes, nested containers). While the current tag-
   stripping approach helps, it doesn’t produce reliable or structured enough content
   for RAG use.
 * I also tried the URL and sitemap import options, but in our setup (preloader 
   + JS-rendered content) they only capture placeholder HTML rather than the final
   rendered page. Not a blocker for me right now, just noting it in case it matters.
 * What actually worked well was manually having ChatGPT transform page content 
   into clean, structured knowledge-base text for RAG.
 * My question: is there a way to override or hook into the WordPress content import
   step in mxchat so I can inject a custom preprocessing function (e.g. HTML → structured
   Markdown/clean text) before indexing? I think hookking on mxchat_before_process_post
   could do the trick.
 * Any pointers to relevant hooks or extension points would be appreciated.
 * Thanks,
   Samy

Viewing 1 replies (of 1 total)

 *  Plugin Support [m4xw3ll](https://wordpress.org/support/users/m4xw3ll/)
 * (@m4xw3ll)
 * [3 weeks ago](https://wordpress.org/support/topic/override-wordpress-content-import-custom-preprocessing/#post-18911573)
 * Hey [@samymsa](https://wordpress.org/support/users/samymsa/),
 * Good instinct – both options are there in 3.2.5.
 * **Option 1 – filter (easiest for in-WP transforms):**
 * Use `mxchat_before_process_post`. It runs right before MxChat builds the KB text
   from a post, and it hands you the full WP_Post object plus the bot_id. You can
   render the content through `the_content` so Elementor and shortcodes resolve,
   then overwrite `post_content` with the cleaned version. Something like:
 * `add_filter('mxchat_before_process_post', function ($post, $bot_id) { $rendered
   = apply_filters('the_content', $post->post_content); // strip Elementor wrappers,
   vc-rows, etc. however you like $post->post_content = your_clean_for_rag($rendered);
   return $post; }, 10, 2);`
 * That way the existing “Add to Knowledge” flow keeps working, you just feed it
   cleaner input.
 * **Option 2 – the new REST API:**
 * If you’d rather build the clean text outside WordPress (your own pipeline, n8n,
   a script, whatever) and push it in, hit `POST /wp-json/mxchat/v1/knowledge` with
   a bearer token from MxChat → API Access. Body takes `content`, `source_url` (
   dedupe key, so re-submitting the same URL replaces the entry), and optional `
   bot_id` and `content_type`. That bypasses the post crawler entirely and gives
   you full control over what gets embedded.
 * For your Elementor case I’d lean toward the filter since it stays inside the 
   normal admin flow, but the API is there if you want to do the cleaning in a separate
   environment.
 * Maxwell

Viewing 1 replies (of 1 total)

You must be [logged in](https://login.wordpress.org/?redirect_to=https%3A%2F%2Fwordpress.org%2Fsupport%2Ftopic%2Foverride-wordpress-content-import-custom-preprocessing%2F%3Foutput_format%3Dmd&locale=en_US)
to reply to this topic.

 * ![](https://ps.w.org/mxchat-basic/assets/icon-256x256.png?rev=3267273)
 * [MxChat - AI Chatbot & Content Generation for WordPress](https://wordpress.org/plugins/mxchat-basic/)
 * [Frequently Asked Questions](https://wordpress.org/plugins/mxchat-basic/#faq)
 * [Support Threads](https://wordpress.org/support/plugin/mxchat-basic/)
 * [Active Topics](https://wordpress.org/support/plugin/mxchat-basic/active/)
 * [Unresolved Topics](https://wordpress.org/support/plugin/mxchat-basic/unresolved/)
 * [Reviews](https://wordpress.org/support/plugin/mxchat-basic/reviews/)

 * 1 reply
 * 2 participants
 * Last reply from: [m4xw3ll](https://wordpress.org/support/users/m4xw3ll/)
 * Last activity: [3 weeks ago](https://wordpress.org/support/topic/override-wordpress-content-import-custom-preprocessing/#post-18911573)
 * Status: resolved