The feed at http://blogs.disneylatino.com/feed/ is perfectly valid. Any RSS aggregator should be able to parse it and remove anything that is not needed. What is this for?
To quote the person at my company who requested this:
“In the description and body content of each item, would you be able to remove any HTML markup, javascript, CSS, etc? In the past, we’ve crawled feeds including these, but those feeds tend to break when some random sequence of characters get introduced, which negatively affects search indexing.”
So, while the RSS feed is valid, the company’s internal search function seems to occasionally break the feed if it finds markup, javascript, or CSS. I know it’s a strange and very specific request.
You would need to create this special feed yourself – although, frankly, I think it’s the internal search or indexing that needs sorting.
Yeah, I fully agree that it’s the search/crawler that needs fixing, rather than jury-rigging every RSS feed.
I was just reading through that entry in the Codex – any functions or snippets of code I could use to avoid outputting the non-XML portions of the feed?