I noticed that a guide for converting the WordPress character set is now available on the Codex. However, its instructions are pretty vague and I’m not comfortable following them.
Has anyone tried this successfully? If so, what were the commands you used to convert the database? I would love it if someone would post a SQL script to achieve this. Such a script should be bundled with the next WordPress install.
I’ve seen it but didn’t try it. Same reason, it all sounds rather vague and I don’t wanna make a mistake :).
I’m also anxious to convert my site from latin1 to utf8!
I don’t get it: In my database there are tables reserved for the gallery program Plogger (utf-8), the rest is WordPress, all collated and charset latin1.
Now I changed the collate/charset in the config.php file to utf-8, imported a backup of this database in a fresh install, changed the ‘read option’ under options from iso-8859-1 to utf-8 and all my posts seem to be allright.
Is that possible? I don’t get it. These small steps didn’t change my database or dit it???
I’m not sure if reimporting your posts does the trick, even on a fresh install. I’m pretty sure you want to convert the text entries to BLOB like the guide says, then convert the charset to UTF-8. However, like I mentioned previously, the guide is very cryptic and likely to be problematic.
I followed the guide recently, made myself a little PHP script to generate the various SQL commands for me.
The source is here if anyone wants to use it:
generate sql for utf8 conversion
I actually made this as a template that I ran (made an empty page that used the template), worked as a charm. The only thing one need to change is the name of the database in the first foreach loop.
Well 6 days ago i release a plugin called “UTF-8 Database Converter” that easy converts the wordpress database, today i release the second version (2.0) which adds certain new logic an a better solution to the problem.
g30rgg3x’s plug-in worked for me. I added the appropriate entries to wp-config first.
Anderapt, thanks for the script. I have a look at yours and the guide in codex, I’m a little confused. Is there any need to convert data type to BLOB first?
I tried a little script from here (not for WP but easy to convert) which worked fine at my localhost. Before I risk my data/database at hosting can someone clarify the need to convert data type to BLOB and back? This little script from sveit.com doesn’t do that step.
The codex don’t mention “why you do this or that” because this topic is a little complex and advance so the users that want to do this normally have a partially knowledge about the problem.
The need to convert to the data type BLOB is simply, what the codex is trying to say is that you need to convert the data type (string type) to his binary representations, we need to that in order to avoid problems of characters getting lost or garbage-characters, that it will ruin all our data in a simply. This is a safe-step because normally you wouldnt bother, the only chars affected are foreign chars like the “ñ” in spanish languages, this char if is not converted before in its binary representation equivalence it will be lost and sometimes screw all your database, so the risk that you have accepted using the script you provide is that, if you have foreign characters they will be converted to trash during the conversion process…
Adding just a little more explanation for why you really need to convert all text(string fields) to blob its because -as i previously says- this is a safe step for converting, because setting to binary it will not be affected the data during conversion and after convertion it will be easy to rollback from binary to string types.
A little list about string-data-type to his binary representations data type is listed next
char -> binary
varchar -> varbinary
tinytext -> tinyblob
text -> blob
mediumtext -> mediumblob
longtext -> longblob
enum -> set charset to binary
set -> set charset to binary
All this fields need to be converted to his correctly representations before convertion, as you can see in the codex thay give the example of converting a text type to blob, but if you need to convert a lets say longtext you will need to change longtext to longblob.
Great explanation now I would only use your plugin for my data in the hosting.
Just tried your great plugin on localhost, it works great. One little question forgive my little knowledge my mysql, I have a quick glance on your code it alters db (line 126) and fields of each table (line 128) to utf8. I couldn’t see anything is done to collation of fields of each table?
When I did my little test on localhost collation was changed to utf8_general_ci? What if someone needs a different collation?
What if someone needs a different collation?
Why would they?
The whole mess around upgrades was caused in part because of the earlier mismatch in encoding and collation (+mysql’s own bugs).
That collation should work well for everybody, I guess.
yes as moshu says,
The “general” (default for utf8) collation should work well for everybody, so i don’t wanna to mess around with this because normally if someone really needs a special collation i think they have the knowledge to do the job himself rather than using automated conversion tools, but in the next version of my plugin i will consider that feature (even i know its really pointless considering that i’m just converting WordPress powered sites, not any type of database)
Justed tried the conversion, and I noticed that some of my plugings stop working properly. I figured out that the problem was in the configuration strings (in the options table). As an example:
You’ll notice that the term “Catégories” has an e-acute in it. If you convert to UTF-8, then this will break because the byte count will be wrong (s:10).
You need to use PhpMyAdmin and change manually the byte count so it reflects the change correctly.
s:10 would become s:11
But, I still have a problem though…. my widgets editor isn’t working properly. I cannot modify the properties of an active widget. The popup doesn’t appear at all.
- The topic ‘Convert database to UTF-8’ is closed to new replies.