WordPress.org

Ready to get started?Download WordPress

Forums

WP can't export/import DOS line endings ... are they illegal in posts? (6 posts)

  1. mykle
    Member
    Posted 1 year ago #

    Hi,

    I'm migrating some WP data from one install to another for a client, using WordPress Export/Import. I export a WXR file from one WP instance, import it in another. Both instances are identical -- same servers, same code version.

    Most of my client's posts contain DOS line endings -- CRLF, aka \r\n, aka ^M^J - in the post_content, and also in serialized strings in postmeta.

    These DOS line endings appear in the WXR file as raw control characters. They're not XML-escaped or processed in any other way, which seems fishy. And when this WXR file is parsed by the importer, those carriage returns are stripped out. This subtly alters the post_content, but it wreaks havoc with the serialized postmeta data, which refuses to import.

    Before I report this as a bug, I want to make sure that my client's not doing something wrong with their custom code. Is it considered illegal or "impossible" to have CR characters in the WordPress database? Are they filtered out somewhere in the wp_insert_post process?

    Thanks for any insight,
    -mykle-

  2. esmi
    Forum Moderator
    Posted 1 year ago #

    Those line endings should be fine but I wonder if your client has been pasting in content from elsewhere and also pasted in some other non-printable characters.

  3. mykle
    Member
    Posted 1 year ago #

    As far as I can tell, the content comes from plain old text fields in web forms.

  4. esmi
    Forum Moderator
    Posted 1 year ago #

    Presumably these forms were added via plugins, yes? I've seen data added by plugins really mess up an export file before now. :-(

  5. mykle
    Member
    Posted 1 year ago #

    Yeah, but shouldn't a person be able to store a carriage return in a post's content or metainfo if they want to? I'm specifically asking if this is documented as illegal somewhere. It seems entirely reasonable to me.

    If a post's meta_info contains a carriage return, and I use the stock WP export and import to transport that data, the meta_info will be lost. That's not plugin code, that's stock WP 3.4.2 .

    I've confirmed that my client is using nothing trickier than update_post_meta() to store the carriage return. update_post_meta() performs some sanitizing and escaping, but it leaves carriage returns untouched.

    I think this is a WP bug.

  6. esmi
    Forum Moderator
    Posted 1 year ago #

    If you can replicate the bug using Twenty Eleven with no plugins active, then you could try posting it in a bug in Trac but every time I've seen this, the .xml file has been malformed.

    Have you tried importing the theme unit test data to confirm that the issue is in your export file?

Topic Closed

This topic has been closed to new replies.

About this Topic