WordPress.org

Ready to get started?Download WordPress

Forums

Blogger Importer
a few bugs (pages, tags, draft posts) (16 posts)

  1. katazina
    Member
    Posted 12 months ago #

    What I found after importing from blogger:
    - pages are not imported
    - draft post is imported, but its title not. Its content is ok, but its title changed to "(no title)"
    - tags are imported as categories
    - images: I had two images uploaded on blogger. The two images are imported correctly and other two jpg are imported as (no title) and no picture. So after the import I had four images in the wordpress media library.
    I think the images are duplicated after the import and one imported correctly and the duplicate one is not.

    Thanks, if you correct these!
    Kata

    https://wordpress.org/plugins/blogger-importer/

  2. Workshopshed
    Member
    Plugin Author

    Posted 12 months ago #

    Kata, thanks for your feedback.
    The pages are not currently supported with the version of the API that the importer uses, to switch to the new API means a change of the security protocol from OAUTH to OAUTH2 so it's no a trivial fix.
    I've imported lots of draft posts and not had any issue with the titles going across, that's quite strange. Is there anything specific about the titles perhaps?
    Blogger does not distingush between categories and tags in it's labels. So yes the labels are imported as categories. The are tools to swap those over to tags. If they were loaded as tags then I'm not sure there is a tool to move them in the direction.
    There's not much to go on for your images issue, do you know how the html was represented in the source post?
    Cheers,
    Andy

  3. katazina
    Member
    Posted 12 months ago #

    Thanks for your answer!

    If you can give me an email address I can add you to this blog and you can check it yourself.

    The draft post title has accents (áéíöüóőúű). I created a new draft post without accents and its title imported correctly. Published posts are not having this issue.

    In the previous version of the blogger importer was a bug with the accents with images. If image name had accents, then that image is not imported.
    I tested just now and now its imported, but its name in the media library looks like url encoded: st-C3-B3ck-p-C3-BAzzl-C3-A9p-C3-AD-C3-A9c-C3-A9as
    :)

    Image path in the html:
    <img border="0" src="http://1.bp.blogspot.com/-RrZQGkOsd34/TbFb3nNiKLI/AAAAAAAABmE/n0DbLrxYBWs/s320/3D+Best+wallpapers+1024+x+768++%25282%2529.jpg" height="240" width="320" />

  4. Workshopshed
    Member
    Plugin Author

    Posted 12 months ago #

    Great diagnostics, looks like the accents are the key to the issues. I can add something similar to my test blog, will let you know if I can't reproduce it on that.
    The image code does some filtering on the filenames and replaces potentially problem characters with a -

  5. Workshopshed
    Member
    Plugin Author

    Posted 11 months ago #

    I successfully imported a draft post with the title "Le foreign post áéíöüóőúű" with no issue. I've not tried this with an image yet.

  6. katazina
    Member
    Posted 11 months ago #

    I created a new wp blog and imported again from blogger and I have the same error.
    Title on blogger: hallihow Télapó
    The content is the same as the title and an image.
    This post never was published, I saved it as a draft.
    My blog language is Hungarian.

  7. Workshopshed
    Member
    Plugin Author

    Posted 11 months ago #

    I tried that title and that also worked for me. I tried switching blogger so that it's language was Hungarian and that too made no difference.
    In your wordpress config file what is the define ('WPLANG', ''); set as?
    Do you have the php xml module installed for Apache?

  8. Samuel Wood (Otto)
    Tech Ninja
    Plugin Author

    Posted 11 months ago #

    Workshopshed: Find out what the character set of the underlying database that is being imported into is. Accents can be weird with different MySQL character sets. You may need to use some iconv trickery here.

  9. Workshopshed
    Member
    Plugin Author

    Posted 11 months ago #

    Cheers Otto, can WordPress support UTF-8? That should handle all of the European languages?
    Katazina, could you find out the database characterset and I'll see if I can reproduce it.

  10. Samuel Wood (Otto)
    Tech Ninja
    Plugin Author

    Posted 11 months ago #

    UTF-8 is naturally preferred by WordPress, but it doesn't enforce the character set because it doesn't know the character set of random data that you give it.

    So, if your database tables are set to UTF-8, but the data is not (say it's ISO-8859-1), then if it has invalid characters, the resulting insertion into the database can lead bad results, because MySQL rejects the string as non-UTF (or worse yet, truncates it at the first non-UTF-8 character).

    Unfortunately, it's often a guess as to what the data is encoded as. You have to look at the data's binary representation and sort of figure it out. It's possible that blogger is returning ISO-8859-1 data, which will work fine if your table is not UTF-8 but will fail if it is. Or vice-versa. Hard to say. This is why details of the specific case matters.

    Ideally, WordPress creates UTF-8 tables. WordPress has used UTF-8 as the default for a very long time, but at one point in time it did not. If it's a new install, it should be a UTF-8 table. Which means that if the data is not UTF-8, then you need to convert it. This also means that if your particular test install is old and/or not using UTF-8 tables for whatever reason, you might not have the same problems inserting seemingly the same data.

    So, look at your character set on your test bed too. And try to examine the binary form of whatever blogger is sending back as well.

  11. katazina
    Member
    Posted 11 months ago #

    php xml module is installed and enabled
    wp tables collation: utf8_general_ci
    DB_CHARSET utf8
    WPLANG hu_HU

  12. Workshopshed
    Member
    Plugin Author

    Posted 11 months ago #

    Looking at the blogger feed, that too is utf-8
    <?xml version='1.0' encoding='UTF-8'?>
    I tried swapping WPLANG and that made no differnce either, although you could see which plugin's have been translated and which not.

    You could try turning on logging to file and use add some logging into the import class.

    In blogger-importer-blog.php locate the function import_posts.

    After the line 145
    $blogentry->categories = $item->get_categories();

    add this new line
    Blogger_Import::_log($blogentry);

    That should dump out the content of the data parsed from blogger into the log. This should tell us if it's the parser that's erroring or if it's at the next stage when it writes it to the DB.

    Test this with a small blog as it will put a lot into your log.

  13. katazina
    Member
    Posted 11 months ago #

    The line you wrote doesn't work for me, but I added:
    error_log(serialize($blogentry))

    With var_dump, for the draft post (with accents) it gave:

    public $title =>
      string(0) ""
  14. Workshopshed
    Member
    Plugin Author

    Posted 11 months ago #

    So, either it's not in the feed from google or it's being droped by the parser.
    If you look at the source of the home page of your blogger blog, you should see a link that looks like this.

    <link rel="service.post" type="application/atom+xml" title="Mini Blog - Atom" href="http://www.blogger.com/feeds/417730729915399755/posts/default" />

    If you navigate to the page mentioned in that link you should see your posts, including the draft post. Does it look ok there?

    Do you want to share that address and I'll see if my test site can parse it.

  15. katazina
    Member
    Posted 11 months ago #

    The title is empty there too, so its probably google's fault.
    Thanks for the searching.

  16. Workshopshed
    Member
    Plugin Author

    Posted 11 months ago #

    Sorry I can't be of more help on this.

Reply

You must log in to post.

About this Plugin

About this Topic

Tags

No tags yet.