Forums

Error in exporting/importing multi-enconding blog content (9 posts)

  1. mhh1422
    Member
    Posted 7 months ago #

    Hi there,

    I've used a wordpress.com hosted blog (http://mhh1422.wordpress.com) for long time and used to add my posts in both English and Arabic to it. Later, I've decided to move to a self-hosted wordpress blog (http://mhh1422en.al-shehab.com).

    I've exported my old blog's content using tools->export and then used the tools->import option in the new blog, but unfortunately, Arabic posts, comments, ... got an worng encoding and appeared as not understood characters, and you can check the two sites: http://mhh1422.wordpress.com (the old) and http://mhh1422en.al-shehab.com (the new) to get the idea.

    I've tried to open the exported XML file in the browser and changed the encoding from windows-1256 into different ones like utf8, windows-1252, windows-1251 but with no result.

    I've opened a forum ticket in forums.wordpress.com on this link: http://en.forums.wordpress.com/topic/error-in-exportingimporting-multi-encoded-blog-content?replies=7 and they advised me to raise this issue here.

    Can anyone help me how to resolve this issue?

    Thanks very much

  2. ChristiNi
    Member
    Posted 7 months ago #

    Hi mhh1422,

    From what I can see through my research, your database should have UTF-8 encoding for Arabic characters to render properly. Usually when you are seeing issues such as this, it's due to encoding. Here's information on how to created a quick and easy PHP script that will change your database encoding for you:

    How to Convert a Database to UTF-8

    Hope this helps!

  3. mhh1422
    Member
    Posted 7 months ago #

    Hi Chistini,

    I've emptied the DB, ran the script and checked the table, all of them were latin and changed to utf8, then I've imported everything again.

    Unfortunately, the problem is still existed...

  4. ChristiNi
    Member
    Posted 7 months ago #

    Hello again mhh1422,

    Just to be clear, when you said

    checked the table, all of them were latin and changed to utf8

    Did you run the script and it changed the tables or did you try changing them from within phpMyAdmin?

    Check the DB_CHARSET property in your wp-config.php file as well. Per the codex:

    the DB_CHARSET property defines the format of content sent to your database and the expected format of content retrieved from it. It does not alter the format of existing tables, so if you have tables formatted with a different character set from the one in DB_CHARSET the results will be eratic both in terms of fetching and saving text.

    This article should be helpful as well:

    http://codex.wordpress.org/Converting_Database_Character_Sets

  5. mhh1422
    Member
    Posted 7 months ago #

    Hi Christin,

    I've emptied the tables and ran the script you gave me, all the tables have been converted to utf8_general_ca.

    And for the other point, I've checked the config file and it is utf8.

    I think I should know the charset and collation of tables in wordpress.com because I've tried different compilations of charsets and encodings in tables and page and that wasn't useful!

  6. mhh1422
    Member
    Posted 7 months ago #

    Also, I think I should know the DB_CHARSET of wordpress.com connection.

  7. ChristiNi
    Member
    Posted 7 months ago #

    Hi mhh1422,

    Do the characters show up correctly in the database when viewing the tables in phpMyAdmin?

  8. mhh1422
    Member
    Posted 7 months ago #

    No they don't. They even don't show up correctly in the exported XML file! I've tried different encoding for viewing the file and all attempts failed.

  9. ChristiNi
    Member
    Posted 7 months ago #

    I'm sorry to hear that mhh1422. I know you said yesterday that the wordpress.com forums recommended you post the question here, but the issue is back at wordpress.com. If you XML file is coming out with incorrect characters, something is going on with your export over there.

    I did quite a few searches of both wordpress.com and in Google and the only other suggestion I have at this point is to check your XML file for an XML declaration. If it is declared as something other than UTF-8, it could be the issue. If there is no XML declaration, the file would rely on the presence of the Byte-Order-Mark (BOM) to dictate the encoding.

    I found these two resources to be helpful for understanding a bit more about XML and encoding:

    http://www.opentag.com/xfaq_enc.htm

    http://msdn.microsoft.com/en-us/library/aa468560.aspx

    Let me know how this progresses, I hope this is resolved for you quickly!

Reply

You must log in to post.

About this Topic