Support » Plugin: Blogger Importer » content is not imported

  • Resolved dawidge

    (@dawidge)


    I’m trying to import a blog from blogger using version 0.5 of blogger-importer. It can connect and auth into blogger, select which blog and download all of the post data EXCEPT the post_content. I get author, date, title, tags, etc, but the actual content of the blog postings is empty. This shows up also when I view the raw database from phpMyAdmin. Except for the ‘Hello world’ sample blogpost and the sample page, the entire post_content column is empty. The same behavior is observed with the comments, the comment_content is completely empty.

    http://wordpress.org/extend/plugins/blogger-importer/

Viewing 8 replies - 1 through 8 (of 8 total)
  • Thread Starter dawidge

    (@dawidge)

    I have switched to trying the beta rel;ease you’ve pointed folks to in the other threads and am seeing better results. The import seems to stall out late in the “Images” box, but I have been able to look at the raw data via phpMyAdmin. The post.post_content is now being filled in, but conmmenbts.commentcontent remains empty Whenever I try to completely restart the import, it only allows me to “Continue”.

    Plugin Author Workshopshed

    (@workshopshed)

    Dawidge, this is not an issue I’ve seen before. There should be little difference between the two versions with regards the content and comments.

    The comments are filtered, firsly by the “SimplePie” sanitisation process which removes HTML, it’s then filtered again by “wp_filter_comment” which might be interacting with a plugin on your site.

    The images process does run very slowly on some systems and does not correctly update the progress bar but typically does finish.

    Thread Starter dawidge

    (@dawidge)

    Is there a good/easy way to debug the inputs and outputs from SimplePie? I’ve sampled some of the direct URLs to the comments from the metadata inside the database and I can fetch them with no difficulty, identify the content block, which is formatted as html whether or not it actually contains any html. I do not see anything unusual inside the content that strikes me as something that would be fikltered out by any of the sanitizers, either.

    I upgraded to 3.5.2, all plugins up to date (and inactive aside from blogger-importer). blogger-importer is the one from the 4010 ticket. Is there an even newer version I should be using?

    A lot of the images are missing from the blogger site. We purged quite a bit, if not all, of the images when the getty crackdown began. We want to in-house the site before we go through the effort of updating the image links.

    I have tried adjusting some of the settings in blogger-importer.php to see if I could coax it into skipping some things or behaving a bit better (increasing MAX_EXECUTION_TIME, disabling IMPORT_IMG, increasing REMOTE_TIMEOUT). Disabling image import speeds things up quite a bit, but it never completes the LINKS step to ask me about setting the author ID, and comments remain empty. In-between attempts I will leave the posts alone, but move-to-trash/empty-trash on all the comments and stop/start mysql and apache to make sure that there is nothing remaining in cache.

    Plugin Author Workshopshed

    (@workshopshed)

    For debugging I’ve been using the _log function in the main class.

    To use that you need to turn on WordPress debugging to file and call the _log function with your message or variable.

    The latest version is the one linked to the 4010 ticket.

    If you’ve got missing images or links then I can recommend using the Broken Link Checker after the import has completed successfully.

    The importer should be able to reprocess the links as you’ve described by deleting all the comments and re-running.

    Thread Starter dawidge

    (@dawidge)

    really tired of banging my head against this wall.

    Fresh install of 3.6
    No other plugins, no themes, pure vanilla 3.6+blogger-importer
    nuked DB and re-ran install.php
    patched/upgraded php, mysql, apache, etc.
    switched selinux into Permissive mode (run faster so it was definitely interfering with something)

    when I try to import, I notice that the error log is filling up with
    PHP Warning: DOMDocument not found, unable to use sanitizer in /path/to/blogroot/wp-includes/SimplePie/Sanitize.php on line 252, referer: http://bloghost.domain/wp-admin/admin.php?import=blogger&token

    I counted (well, grep|wc) and there is one error line like this in my error log for each comment entry. All of the comment metadata, author, timestamp, etc gets imported, but the content of the comment is empty

    I tried importing to wp.com (worked) and exporting to wxr, then using wordpress-importer. That simply hangs, appearing to do nothing, but is producing similar DOMDocument/sanitize errors.

    Am I missing some critical thing that SimplePie is expecting to be there?

    U have looked at a couple of the actual comments by fetching the wp_commentmeta.metavalue (i.e. /feeds/….) URL from blogspot and the content block of the comment is pretty vanilla (specified as type html, but containing straight text with a couple of amper-2-letter special character sequences.

    Thread Starter dawidge

    (@dawidge)

    yum install php-xml

    import seems to be progressing now and comments are being imported without errors. it hasn’t prompted me for author assignment, yet, but I have high hopes.

    Plugin Author Workshopshed

    (@workshopshed)

    That’s great news, hope it goes ok. I’ve added php-xml to the readme file in the pre-requisites section.

    Thread Starter dawidge

    (@dawidge)

    Success! Also note that apache needs a restart for php to reload all of its config.

Viewing 8 replies - 1 through 8 (of 8 total)
  • The topic ‘content is not imported’ is closed to new replies.