Support » Fixing WordPress » XML parsing not in chunks, parser out of memory

  • Hi

    This started as an issue in importing blogger exported XML (16MB) through the blogger importer plugin but issue seems to be in wp core in SimplePie Parser. It shows up as a warning in log as

    Warning: Invalid argument supplied for foreach() in <pathlocation>/wp-content/plugins/blogger-importer/blogger-importer.php on line 227

    but the actual memory error from core XML parsing doesnt bubble up.

    I started instrumenting the code and did a hacky solution for my local build and i believe its a possible bug-fix. I want to discuss if indeed its a bug and i didnt miss some flag or something.

    Location – https://core.trac.wordpress.org/browser/trunk/src/wp-includes/SimplePie/Parser.php#L154

    The code looks something like this

    if (!xml_parse($xml, $data, true))
    {
        $this->error_code = xml_get_error_code($xml);
        $this->error_string = xml_error_string($this->error_code);
        $return = false;
    }

    which just loads the whole xml as one chunk and parser errors out with “no memory”. From my googling it seems there is a hardcoded limit for chunk size in the library.

    In my local install i changed it to chunked parsing.. something like this and it worked.

    $data_len = strlen($data);
    $data_offset = 0;
    $chunk_size = 4096000; // sleepy dev's 4MB
    
    while ($data_offset < $data_len )
    {
        $data_to_parse = substr($data, $data_offset, $chunk_size);
        $data_offset += $chunk_size;
        
        // Parse!
        if (!xml_parse($xml, $data_to_parse, ($data_offset > $data_len)))
        {
            $this->error_code = xml_get_error_code($xml);
            $this->error_string = xml_error_string($this->error_code);
            $return = false;
        }
    }

    Its obviously hacky code and wordpress devs would need to polish it up but this would fix the xml parsing issues and as a side effect the blogger importer plugin.

    Is this really a bug or did i just miss some memory setting somewhere? (Yes i increased the php post size limit, file upload limits and php memory limit, its not that)

    Setup
    OS – Fedora 29
    Webserver – nginx
    Wordpress version – 4.9.8 (clean install with no plugins except blogger-importer)

    php.ini settings (relevant)

    memory_limit = 2048M
    post_max_size = 200M
    upload_max_filesize = 200M
    • This topic was modified 1 year, 2 months ago by amanmanglik.
    • This topic was modified 1 year, 2 months ago by amanmanglik. Reason: added php.ini settings and typo correction
Viewing 5 replies - 1 through 5 (of 5 total)
  • There’s a few xml chunker apps out there to split the oversize xml files down to something closer to what a normal XML import would be… something the XML parser should be able to handle.

    I figure coding up a one off fix for a problem that is outside of the scope of the normal uses of some ‘service’ is probably taking big risks with further uses/abuses down the road.

    I’ve only had to use those xml chunkers a few times.

    I’ve had to use the SQL chunkers quite often though… That’s where I first learned about all that kind of stuff. Moving and migrating sites is so much fun!!! About as much fun as getting two dissimilar sites (or even exact matches) to work well together.

    Maybe, but you shouldn’t need to chunk your xml file. If i have increased the file upload size limit and given php enough memory then it should be able to handle my xml. The documentation for php xml_parse suggests using it in the chunked manner which the current code is not doing. https://secure.php.net/manual/en/function.xml-parse.php

    Also in this case the parser error is getting suppressed and the actual warning in the error log is completely useless for troubleshooting.

    Could you create an enhancement ticket at https://core.trac.wordpress.org/ please?
    You can provide a patch if you have one, and then people can test and improve, and get it into core.

    opened https://core.trac.wordpress.org/ticket/45303

    I opened as a bug (defect) and not as enhancement because it breaks existing functionality of parsing xml reliably and in turn the blogger importer plugin.

    I saw that in #core on Slack.
    Thank you!

Viewing 5 replies - 1 through 5 (of 5 total)
  • The topic ‘XML parsing not in chunks, parser out of memory’ is closed to new replies.