Support » Fixing WordPress » Database character set and collation

  • Jim

    (@jwmc)


    My site has been on WordPress just over 3 years I think. About 6 months ago I migrated to a different host.

    Recently an error showed up in debug log when someone tried to spam my site. It began like this, followed by a bunch of spam content:
    [18-Dec-2020 17:00:52 UTC] WordPress database error Illegal mix of collations (utf8_general_ci,IMPLICIT) and (utf8mb4_unicode_ci,COERCIBLE) for operation 'like' for query SELECT SQL_CALC_FOUND_ROWS wp75_posts.ID FROM wp75_posts WHERE 1=1 AND wp75_posts.ID NOT IN (5729,5709,4178,2170,2433,1966,1908,1800,650,76) AND (((wp75_posts.post_title LIKE . . .

    Someone told me this indicates a problem in my database, that I have the wrong character set and/or collation. wp-config.php has:

    define('DB_CHARSET', 'utf8');
    define('DB_COLLATE', '');

    In phpMyAdmin, it shows Server connection collation is “utf8mb4_unicode_ci”, and Server charset is “UTF8 Unicode (utf8mb4)”. But the tables have a variety of collations:

    utf8mb4_unicode_ci     (tables of 2 plugins)
    utf8mb4_unicode_520_ci (only one plugin table has this)
    utf8_general_ci        (all WP core and Wordfence tables)

    The site seems to work okay. Do I have a problem? I’ve seen a lot of different ideas about how to fix this if necessary, and would appreciate some authoritative pointers. Thanks.

    The page I need help with: [log in to see the link]

Viewing 5 replies - 1 through 5 (of 5 total)
  • Moderator bcworkz

    (@bcworkz)

    Don’t consider this authoritative, but I do know something about this subject. My advice is to not mess with charsets unless you really know what you’re doing. You stand to corrupt data if you don’t. Your DB connection is the 4 byte variant of UTF-8, that’s perfect.

    OTOH, the collations dictate the ordering of alphabetical query results. Changing them will not corrupt data, though specifying the wrong ones could cause unexpected query results. The problem arises when data with two different collations need to be combined. SQL doesn’t know how to handle that. utf8mb4_unicode_ci is the default collation for utf8mb4 charset, so I’d make the DB and all tables and all applicable columns that specific collation.

    Even though collation change does not corrupt data, make a full backup of all data before making changes. Better safe than sorry.

    Thread Starter Jim

    (@jwmc)

    Thanks. Well I certainly don’t really know what I’m doing. Seems like there are so many places where these things are set: database connection/server, wp-config.php, the database, each table, and each column.

    But I found this https://sternerstuff.dev/2019/04/convert-wordpress-tables-to-utf8mb4/
    which is just a script to run the function maybe_convert_table_to_utf8mb4() from /wp-admin/includes/upgrade.php. I ran that in staging and no harm, all the tables and columns that were utf8 were converted to utf8mb4 with collation utf8mb4_unicode_ci.

    Then here https://make.wordpress.org/core/2015/04/02/the-utf8mb4-upgrade/
    is info about indexes that might have to be resized. Most didn’t need it (already 191 characters), one got resized automatically from the script, and one remained 200, which I changed to 191 in phpMyAdmin. Finally change the wp-config defines to the new charset and encoding.

    Everything looks good in staging, so I’m tempted to do it in live. They say all the encodings in utf8 work in utf8mp4, so there should be no problems of changing characters.

    Moderator bcworkz

    (@bcworkz)

    They say all the encodings in utf8 work in utf8mp4

    I too believe that to be correct. utf8 encodes with 1-3 bytes per character, utf8mb4 encodes 1-4 bytes per character. It extends the upper range of available encoding by 8 fold.

    I was unsure about what to define for WP_CHARSET. The WP docs are pretty adamant about leaving it ‘utf8’. Apparently as long none of our data gets up into the 4 byte encoding range, this works even if the DB connection is utf8mb4. If we know the connection is utf8mb4, it should be appropriate to define WP_CHARSET as ‘utf8mb4’. Leaving DB_COLLATE defined as '' is always appropriate, WP will use what is defined for the DB.

    Good luck going live, though I don’t think you need it. As always, keep backups 🙂

    Greeting,i have the problem with database from site that i made locally on computer by Bitnami app,and when i upload my database to pumbilc_html folder of hosting ,they told me that it apears some king of error charset utf 8 collate utf8_general_ci ,and he ask his admin and he told me that i need to change in my sql file but when i opened the copy of my database file there are lot of places where is there that kind of code so i didnt want to anything is there any advice for me what to do,and also in phpmyadmin there is collation set to utf8mb4_unicode_520_ci for almost all files

    But I found this https://sternerstuff.dev/2019/04/convert-wordpress-tables-to-utf8mb4/
    which is just a script to run the function maybe_convert_table_to_utf8mb4() from /wp-admin/includes/upgrade.php. I ran that in staging and no harm, all the tables and columns that were utf8 were converted to utf8mb4 with collation utf8mb4_unicode_ci.

    Sorry, but where do I have to run this script? I’m a bit lost…

Viewing 5 replies - 1 through 5 (of 5 total)
  • You must be logged in to reply to this topic.