Support » Developing with WordPress » MySQL encoding and DB backup/restore/transfer issues

  • Resolved theUg


    I’ve got some issues here. I want to let everybody know, that the issues are between two free hosters, apriori possibly crappy, but I’m trying to resolve this situation, and learn in the process, because I have virtually no experience in this.

    1. The site is UTF-8 encoded. The MySQL is UTF-8 encoded as well. Majority of the blogposts are in Russian (cyrillic). So, the site works fine (when it works), no problem with content. However, what bugs me is the representation of the content in DB. If I access DB through phpMyAdmin, all the fields with cyrillyc posts, comments, etc. show up in a mumbo jumbo like that: ообще оченÑ.

      I also noticed, that “collation” field in wp_ tables have value of latin1_swedish_ci. What is collation in DB terms, and why it has such peculiar choice of value?

    2. So, if I backup DB dump as a .sql text file on my machine it shows same обще очРin the text editor. So same way it shows in the other server’s DB after import.

      Now, the interesting thing starts. After import I access website, and most of the content is fine, but some of it is corrupt. It’s only certain letters that are corrupt (I confirmed a few: small (с), (э), (я), and capital (А) and (Н)).

      For instance, phrase “тест, тест, тест” (test, test, test) turns into “те�?т, те�?т, те�?т” on the site itself, and in the RSS-feed.

      You can see it here: (it’s a test site for now (since I’m not transferring domain, till I figure this out), so feel free screwing with it).

    3. However, note, that second comment shows up correct. That’s because I posted it on the new host, so all the new content will show up correct.

      But yet again another peculiarity: both of this comments show up in DB absolutely identical with the value of “ТеÑ?Ñ‚, теÑ?Ñ‚, теÑ?Ñ‚.”.

    So, what can cause all this? Could it be phpMyAdmin issue? Could it be MySQL misconfiguration on one (or both) of the hosts? Is it purely encoding mismatch, or some internal bug in WP? Is it at all typical? How do non-latin characters are being recorded in other peoples’ BD? Why is collation set to Swedish?

    P. S. The MySQL version on the first host is 4.0.24, on the second — 4.1.11-Debian_4sarge7. WP version 2.1.

Viewing 3 replies - 1 through 3 (of 3 total)
  • I read up on charsets and collations in MySQL docs, and found that latin1_swedish_ci is the default collation for cp1252 West European (which is default charset for MySQL). That is still weird: why doesn’t WP convert DB to UTF-8, if I specified UTF-8 as the default encoding?

    Is that because some hosters won’t allow for scripts to change DB settings?

    Many hosters just haven’t realised since they upgraded from older version MySQL 4.0.2x to newer 4.1.x or 5.x

    There are approx 3 lines in the my.cnf that they can remove to not enforce encoding at server level, and hence allow you to choose.

    Ask them nicely. 🙂

    After some search, discussion, and fine handywork with the file, hammer and ductape, I figured this out. Firstly, it was necessary to set the collation to UTF as soon as DB was created. Secondly, it is important to add SET NAMES to the /wp-includes/wp-db.php file. This way DB will properly understand non-latin characters.

    I wasn’t able do decode corrupted dump, although I still think it is possible, but I used built-in export/import system, which perfectly exported the XML file in proper encoding. I had some problems with import, when some internal links got lost, so I had to manually edit DB to fix broken relations. But all in all it is fixed now.

    I still would argue that SET NAMES directive should be the standard part of UTF-based WP.

Viewing 3 replies - 1 through 3 (of 3 total)
  • The topic ‘MySQL encoding and DB backup/restore/transfer issues’ is closed to new replies.