Support » Fixing WordPress » Using wp_insert_post with special UTF8 characters

  • Resolved Trahald


    How can one filter/encode UTF8 characters, such as foreign language characters, properly before passing them into the wp_insert_post() function? I’m looking to write a plugin that uses the wp_insert_post() function, but I’m having trouble with how to filter/escape characters properly.

    For instance, the foreign language characters o and u with umlaut (Ö, ü). These end up being funky in my post title and post content.

    For instance,

    $post_tags = array('ödünç', 'müzik');
    $post_tags = array_map('utf8_uri_encode', $post_tags);
    $post_array = array('post_content'=>utf8_uri_encode('Go ödünç müzik'), 'post_title'=>utf8_uri_encode('ödünç müzik'), 'post_date'=>date("Y-m-d H:i:s"), 'post_author'=>0, 'post_status'=>'publish', 'tags_input'=>$post_tags);
    $inserted_post_id = wp_insert_post($post_array);

    If I don’t apply any filtering, the foreign language characters seem to not show up at all. If I apply utf8_encode(), ‘ödünç müzik’ displays as ‘�d�n� m�zik’. If I apply the WordPress formatting function utf8_uri_encode, it shows up as ‘dn%f6%fc%e7 mzik’.

    If I manually post this phrase, it seems to display fine. What filtering/escaping/encoding is WordPress applying to the dashboard input that I should use on the parameters in my plugin?

Viewing 5 replies - 1 through 5 (of 5 total)
  • Anybody have an experience or insight on these WordPress character encoding issues?

    Note that I’m using WP 2.3.

    I’ve analyzed write_post(), wp_write_post(), wp_insert_post(), and even upgrade.php’s wp_install(), and still don’t see anything related to any sort of escaping or character encoding.

    But the issue with wp_insert_post() also holds for wp_install() – if I install a blog via WP’s install.php whose title contains umlauts, the blog_title is set fine; but if I make a test script that utilizes wp_install() to create the blog, and pass the $weblog_title via GET instead of post or from some query to a temporary database table that contains the string, the blog title gets all messed up. Even though if I echo $weblog_title during this script, it comes out correct.

    I also tried passing the title string with my own POST form, but even that didn’t work.

    Is WordPress doing any sort of modification of the $_POST array or something? I see that in wp-settings.php, stripslashes_deep() and add_magic_quotes are being applied to $_POST, but I tried both applying this to my the $post_array I’m passing to wp_insert_post and that didn’t work, and also commenting out these lines from wp-settings didn’t break the Dashboard posting, that still worked anyway.

    I’m just about at my wits end with this umlaut stuff (and also the trademark symbol). Does anyone know anything about this? I just don’t understand why these characters would work fine if input via WordPress’s default forms, but get all messed up if passed with a plugin or external script.

    After dozens of hours of wasted effort, I finally figured out the problem. My WordPress was an upgrade, and so my wp-config.php was lacking the following:

    define(‘DB_CHARSET’, ‘utf8’);
    define(‘DB_COLLATE’, ”);

    Adding this fixed the problem. Such a ridiculous error, but I’m glad to have it solved 🙂

    And thank you for posting back the solution!

Viewing 5 replies - 1 through 5 (of 5 total)
  • The topic ‘Using wp_insert_post with special UTF8 characters’ is closed to new replies.