WordPress.org

Support

Support » Plugins and Hacks » Clean HTML from MS Word.

Clean HTML from MS Word.

  • I hope someone can help me. I am at my wits’ ends. 🙁

    I am looking for a WYSIWYG editor that produces standards compliant code, particularly when copy/pasting from MSWord. I’ve just spent the last 7 hours looking for a solution, to no avail.

    Before you go on a tangent and rant against WYSIWYG (as so many people seem to do when someone is asking WYSIWYG questions), I ought to point out that there are situations where it is necessary to use them. Myself, I have been using Notepad to handcode html since before Lynx was pretty much the only web browser option (yes, i’m that old). But the site will be used as a CMS by people who are not HTML savvy and who have much better things to do than to learn html.

    I have downloaded and tried several plugins:
    – Chenpress
    – WYSI-Wordpress
    – Xinha4WP

    I had a look at WYSIWYG Pro.

    I had a play with X-Valid

    Obviously tried the built-in Tiny MCE.

    None of these allow you to cut/paste from MSWord and get clean HTML, free of extra classes and spans and formatting.

    The only editor that seems to handle that properly is XStandards. But I can’t seem to find anyone who’s done an XStandard integration into WordPress. I’ve spend quite a bit of time on their site, and I just don’t see how to do such an integration.

    Surely, I’m not the only one that has such a need? It’s not asking for much, just want to be able to cut/paste from Word into the editor and not have cr*p code show up.

    Has anyone found a solution?

    //Edit: Saving a word doc to text and then copy pasting from notepad into WP is not a desirable solution.

Viewing 15 replies - 1 through 15 (of 25 total)
  • Pure Text? http://www.stevemiller.net/puretext/ If nothing else, it’s a helluva lot quicker/easier than the Word/Notepad/Tiny MCE shuffle.

    Do you want ANY formatting to come through from Word? Or are you using Word purely for spelling/grammar checking only? If the former, it’ll be a bit of a challenge to find a tool that will delete some but not all. I wrote about Pure Text a while back and got an interesting comment that may or may not interest you (http://www.solo-technology.com/blog/2006/07/09/pure-text/ and it’s the only comment for the article).

    Thanks Solo,

    That PureText is one option that I’ll keep as an alternative, but it involves my users downloading, installing and learning it, including using hot keys instead of mouse/menus. Dealing with unsophisticated users here (no judgement on them, just stating a fact that computers is a necessary evil in their lives, the less they have to learn about them, the better).

    Incorporating xstandard would be the optimal solution, but that certainly seems too complicated for my coding skills 🙁

    As for keeping or not some formatting, I don’t know what the users will be doing for sure, but I suspect that they might wish to keep some of the formatting. Though that can be retrained. Hmmm.

    If I were you, I would have them copy/paste their old files into wordpad, then copy /paste int WP. Alot if not all of the formatting will be removed. Educate them as to how to use WP’s built in toolbar, which is as simple as writing email. If they won’t learn that, then that’s a problem. They should be willing to try:)

    Hi vav,

    Before I proceed with my post, I like to note that this is my first post on this forum. I am proud to power my site with WordPress and I am very thankful to this community for helping me termendously with my WordPress world. Thanks everybody.
    ————-

    I understand your problem. I also have some members who don’t like to mess with Word, WYSIWYG and HTML. And I know there is no way around so far.

    But I approached this problem with a different angle. Because they dont like the HTML, but I know HTML, then I made my own quicktags buttons. Those buttons consist of functions which I know that my members need them. In short, I made our own Word-quicktags-WYSIWIG editor.

    Knowing HTML as you said, just decide what functions you need. If you need larger text, make it a CSS, put it in your template CSS file, then add it as a button to the quicktags.

    Talk to your members to know what is their needs. You make the HTML editing one time for all. And from time to time you improve it, delete some unused buttons, add new.

    Here is a guide by Tamba how to edit the quicktags: http://www.tamba2.org.uk/wordpress/quicktags/

    flakkito, yes, I can educate the user. I’ve been educating users for years. Frankly, I’m tired of *having* to educate users. Why not have a standards compliant wysiwyg editor that is easy to integrate? Oh, forgive the rant here, it’s a sore topic for me. I have done way too much “educating” of way too many users. It’s a fact of life, but the thing is, it really shouldn’t be that hard to copy/paste and strip the *#&*^&@*$() added code from MS.

    @kadmous, the issue isn’t so much styling once in WP, but the ability to preparing content in Word, off line, and then adding the content on the site.

    but the thing is, it really shouldn’t be that hard to copy/paste and strip the *#&*^&@*$() added code from MS.

    Lots of commercial software packages have made attempts at this, none of them seem that robust (they are only concerned with making MS Word work in their application).

    This is a MS issue. Take a look at IE or Outlook where little concern is shown for accepted standards or compatibility. Word was never designed for HTML, it’s inclusion was an afterthought (approached, as usual, with minimal consideration for compatibility except with other MS products).

    Yosemite, I realise it’s an Office issue, but xstandard does manage to strip the “uglies” from a simple copy/paste from Word.

    I guess bottom line in this case is “I wish I knew enough php/javascript to know how to integrate xstandard as the editor into WP”. Thing is, there are a few people that discuss it here and there, no one seems to have done it.

    <shrug>

    I have the same problem, but I was thinking of tackling it from a different angle.
    I was thinking of developing a xsl stylesheet to convert a odt (open-office open document) into a clean html. The thing is NOT that difficult, given some knowledge of xml. You can open any word file in open office, and then use a custiom xsl filter to get a valid and clean html output. It would be very nice to have a ‘wordpress html’ filter in openoffice.
    Yet, I have just started off, and I am a beginner. What do you think of this solution?
    Davide

    Admittedly it’s a while since I used MS Word but can you not create a document and then save it as plain text (.txt)? All the user has to do that’s extra is a “Save As…”; not too arduous even for the technophobic.
    Presumably this would purely give out the content without MS formatting and allow a safe copy/paste into the WP WYSIWYG for final formatting.

    Or maybe I’m being naive…

    @tptboy, the problem with save as text, is that you have to then open notepad, or editpad, or other text only editor to do the copy/paste. Extra steps that I may not have a problem with doing, but that I know other people are quickly going to find cumbersome. Opening the text file in MSWord leaves the silly MS formatting in when you copy/paste 🙁

    @nutsmuggler, the problem with your solution is, again, forcing someone to use a second or third solution. OO is good software, I had it running in a non-profit I was involved with. But it’s beyond the scope of most web designers to be able to convince an entire organisation to switch from one office suite to another. Your solution might work well for you, sadly, i don’t think it’ll work for me in this case 🙁

    Excuse me, you said you are looking for a WYSIWYG editor that produces standards compliant code. OpenOffice can be such editor. You don’t need to force your customers to switch to OO, you just get their word documents, open them in OO and use the (alas, hypotetic) wordpress filter to produce standars clean wordpress compliant html. OO is also free. The only drawback, at this stage, is that such filter does not exist yet, but I’ll keep you posted.
    Davide

    “Presumably this would purely give out the content without MS formatting and allow a safe copy/paste into the WP WYSIWYG for final formatting.”

    This is exactly the approach I always recommend to people. Combined with the excellent Markdown (as a WP plugin) my customers get a hassle-free and easy way of creating content. Markdown is so easy to learn too. People usually learn it in under 10 minutes and once they learn it, they don’t forget.

    @nutsmuggler, sorry, I didn’t make myself clear. I am looking for a standards compliant wysiwyg editor that works in WP, not a stand alone one. Like Tiny currently is the wysiwyg editor for WP, only it’s not very good. Using a 3rd application is not what I want.

    @pizdin, yes, that’s an approach that could be taken. But I go back to the fact that non-computer savvy people have better things to do than to learn markup. Any markup. To you and me, simple markup language like that is a breeze. But to a lot of people, it’s a different story.

    The bottom line is, it should be possible to go straight from one wordprocessing application, be it Word, Wordperfect, OpenOffice, and copy/paste content into a CMS’s editor window without losing formatting, and without having extraneous code added. This is not a rant against wordpress, btw.

    Which brings me back to: Has anyone managed to integrate XStandards as the editor in WordPress? Maybe I should start a thread titled that… 🙂

    Perhaps you should look for the solution in a different place: instead of a standards-complaint WYSIWYG editor, try a standards-compliant HTML filter to integrate into WordPress directly. Not sure how well that filter will deal with Microsoft’s proprietary tags though. And it doesn’t have a WP plugin yet, although the API is so simple that I think doing that would be trivial.

    As for XStandard, this seems to be an application in and of itself (not Javascript), so it would require users to install something. Also, since it’s client side, there’s no guarantee that the input coming to you will be compliant. You really ought to look for something server-side. If the server can transparently clean up the code, it doesn’t matter how bad or good the WYSIWYG editor is as long as it doesn’t drop any tags.

    Sorry if this is resurrecting a dead topic.

    vavoom: I use a custom version of the Advanced WYSIWYG Editor plugin modified to add the “Paste from Word” and “Paste as Plain Text” buttons. I show it in action in this help video. I tell my clients to “use the ‘W'” to do their pasting from Word and use the “clipboard ‘T'” to paste from everything else. I’m to the point where I’ve started to remove the regular paste button (but of course they can still just paste it in (ctrl + v).

    Here are lines 34-39 of “advanced-wysiwg.php” plugin mentioned above:

    function extended_editor_mce_buttons_2($buttons) {
    return array(
    "cut", "copy", "paste", "pastetext", "pasteword", "undo", "redo", "separator",
    "table", "sub", "sup", "forecolor", "backcolor", "charmap", "separator",
    "code", "fullscreen", "wordpress", "wphelp", "cleanup" );
    }

    You do need to make sure to upload the “paste” folder from the TinyMCE install into the plugins directory of /wp-includes/js/tinymce/plugins/paste .

    If you’ve been through this, you know what I’m talking about. If you have no idea what I’m talking about, I can try to explain it better. There is more in-depth explanation in this thread. I like this option because it’s just a small addition to the standard WP install and I don’t have to reconfigure 47 things when I upgrade.

    Let us know how it goes.

    – Bradley

    Later: Just saw this plugin which seems to do the same thing but maybe easier to install?

Viewing 15 replies - 1 through 15 (of 25 total)
  • The topic ‘Clean HTML from MS Word.’ is closed to new replies.
Skip to toolbar