I'm in the process of modifying the solr-for-wordpress plugin to suit our needs. We need to be able to index all of the posts in our multisite wordpress install (over 2k blogs) so that we can then search for posts tagged with a certain tag. We use this data on a different site. Originally we were using the tag RSS feed for each site but it's just too slow. Enter solr.
I'm experienced with PHP but new with wordpress and I'm having some trouble figuring how to get post content that is at least similar how the RSS feed would look into solr. That is, I need it to be stripped of the HTML tags and specifically special tags like [caption]. Is there a function or filter that can do this for me?