Support » Fixing WordPress » conflicting character encoding, wordpress vs mysql

  • Back in December last year I set up a bilingual English/Korean blog. At the time I was in a real hurry, and since I had some problems with UTF-8, I resorted to setting the WordPress character encoding to EUC-KR and everything seemed to work fine.

    Now, with WordPress 2.0.3, I’ve found that UTF-8 works well with Korean input (I tried it on a fresh installation), so I’d really like to move my existing blog over to UTF-8. However, I’m having some real problems.

    First I tried this technique: to SSH into my server, use mysqldump to create an SQL file and then run iconv to convert EUC-KR to UTF-8. But iconv simply refused to convert the file, truncating the output as soon as it reached a korean character. Next, I FTPed the SQL file onto a Windows machine and tried to open it with various editors (including notepad++, and visual studio) so that I could save it again as UTF-8. Then I realised that the SQL file I had exported was already UTF-8…

    In fact, I have discovered that my MYSQL server doesn’t have any Korean-specific character sets or collations installed. So, mysql is recognising everything as UTF-8, but my wordpress character set is EUC-KR! I’m surprised that it’s been working at all.

    But how do I convert this across???

    I even tried writing some PHP code to pull some Korean text out of the database, translate it, and put it back. Here’s the experimental code that I wrote. It operates on a database table created by one of my WordPress plugins:


    function perform_maintainence() {
    global $wpdb;
    $sql = "SELECT * FROM $this->table_name;";
    $rows = $wpdb->get_results($sql);
    foreach($rows as $row) {
    $utf8name = iconv("EUC-KR", "UTF-8", $row->translated_name);
    $utf8name = $wpdb->escape($utf8name);
    $sql = "UPDATE $this->table_name SET translated_name='$utf8name' WHERE cat
    _name='$row->cat_name' AND locale='$row->locale';";
    $wpdb->query($sql);
    }
    }

    The results are patchy. After the conversion, half of the korean syllables do appear correctly, but the others remain scrambled. And this just makes me even more confused.

    What makes things harder, is that I can’t view Korean syllables when I SSH in with Putty, and phpMyAdmin also mangles the korean text.

    Any advice would be greatly appreciated!!!

Viewing 4 replies - 1 through 4 (of 4 total)
  • Thread Starter graphox

    (@graphox)

    Asssssssa! I’ve cracked it. It’s taken me several frustrating hours.

    I was helped by this poor guy who apparently had an even harder time: Turning MySQL data in latin1 to utf8

    My solution was this:

    First use mysqldump to export the database content in <b>latin1</b> encoding. Yes, latin1! i.e.:

    mysqldump --default-character-set=latin1 ... > dump.sql

    Then use iconv to convert the file, but force it to read the file as if it were actually EUC-KR.

    iconv -f EUC-KR -t UTF-8 dump.sql > utf8_dump.sql

    (bear in mind that on your system the precise encoding names may differ to the ones above, run ‘iconv -l‘ to get a listing)

    iconv may also produce some errors if your dump contains characters that are not valid EUC-KR. For me this only happened on tables that were being used for caching (by one of my plugins), so I simply deleted the table from the SQL file and it converted perfectly.

    Next I opened the mysql client from my unix shell and tried to ‘source’ the sql file (By this point I’d managed to configure Putty properly, so that korean text and utf8 were working). I eventually realised that it was the mysql unix client that was causing the problems. So I imported the dump file using phpMyAdmin and hey-presto, the Korean syllables were appearing correctly in phpMyAdmin.

    However, in my blog things looked worse than ever. I just had a mass of question marks.

    This was easy to resolve though, the solution has already been posted elsewhere. You must edit the file wp-includes/wp-db.php to add an extra mysql query after the database connection. In the current version, add this at line 43:

    mysql_query("SET NAMES utf8");

    BigBri,

    I’m new to WordPress and have no coding experience, but I’ve done a good job in following instructions to fix problems with building my blog so far. Is the solution that you or graphox propose useful for solving the problem I have with “& g t ;” appearing where a “>” should appear on news feeds I’ve posted? (As in “NY Times > Home Page” reads NY Times & g t ; Home Page) You can see it at:

    grottomazoo.com

    I’d follow his or your instructions, but I’m not sure if the utf8 issue is the source of my problem — plus, I don’t want to completely wreck my site!

    Any help you can offer would be appreciated.

    If it is about feeds from another site… then no, the techniques above won’t help.
    You don’t have the feeds in your DB – it”s just a simple mismatch of the encoding of your blog and the encoding of the feed’s source. You can’t really do anything about it.

Viewing 4 replies - 1 through 4 (of 4 total)
  • The topic ‘conflicting character encoding, wordpress vs mysql’ is closed to new replies.