Support » Fixing WordPress » Any way to strip ms word code from database en masse?

  • In the middle of migrating a monster wordpress site I just inherited — it has close to 3000 posts. I just realized the primary author has been copying and pasting from word for the last 4 years. The old theme didnt really suffer much from that, but the new one I created is getting really messed up. Is there any way to run a SQL query through phpmyadmin that will seek out all embeded html code from ms word and delete it? My MySQL query skills are close to nil. I am pretty confident there is no other html code intentionally in there so it would work to just get rid of any html at all from the database of post content.

    Is this sort of thing even possible?

Viewing 4 replies - 1 through 4 (of 4 total)
  • Yikes, that’s a messy one. Gotta love it when computer novices copy and paste rich text from Word into a web form. It would be really helpful if Word had a one-click “convert to plain text” feature for this kind of situation, since it comes up so often (yes, I know there is “paste special” but most people don’t know about that).

    Unfortunately, I don’t think it is possible to solve your problem with a MySQL query. In order to strip out all HTML tags, you would need to do a search and replace using wildcards (i.e. find ‘<.*>’ and replace with ”). But the MySQL REPLACE function does not support wildcards.

    The only way you could do this would be to write a custom PHP script that reads each post out of the database, performs a search and replace (the PHP string functions are much more capable), and then writes the post content back to the database. An experienced programmer could probably whip that up in a couple of hours. Still, it is a risky operation — you definitely want to make a backup of your WP database before running any such script.

    Thanks for the input. I had a bad feeling about it.

    If you want, contact me off-forum and we can discuss further. It’s a bigger problem than I can solve by posting code snippets on a message board — I’d want to set up a test environment and do some trial runs to make sure it would not totally hose your site — but it is doable.

    Sent you an email

Viewing 4 replies - 1 through 4 (of 4 total)
  • The topic ‘Any way to strip ms word code from database en masse?’ is closed to new replies.