WordPress.org

Ready to get started?Download WordPress

Forums

UTF-8 and ISO--8859-1 (14 posts)

  1. anatman
    Member
    Posted 10 years ago #

    WP1.2 supports UTF-8, that's nice. But what do i do with my old posts containing special characters? These characters will show right only if i set 1.2 to use ISO-8859-1 in the options page.
    Is there a way to have the old posts converted?
    Thanks

  2. Beel
    Member
    Posted 10 years ago #

    In lieu of a permanent solution, I think you could probably do it using vars.php. Give it a look-see and add the character changes there.

  3. anatman
    Member
    Posted 10 years ago #

    Hm!
    'ISO code here' => 'UTF code here', and WP will take care of translating the iso to utf? Is that all?
    Thank you, Beel.

  4. anatman
    Member
    Posted 10 years ago #

    I just thought of something: suppose i use this in vars.php:
    234 => 135 (just making up something as an example)
    That tells WP to use unicode #135 everytime it sees a #234, right? That may cause some conflicts - what if i do want to use #234 someday?
    I think there is a need to provide conversion for previous posts, if Unicode support is to be serious. Am i right?

  5. anatman
    Member
    Posted 10 years ago #

    THAT looks good! I will try it right now, after backing up the DB of course :)

  6. anatman
    Member
    Posted 10 years ago #

    Update: i couldn't do the MySQL string replacement thing. What i did was: i opened the dump with a text editor to take a look inside the post, to check how the strings with the ISO codes where written therein - but there are no ISO code strings in the dump! The dump reads nearly as a normal text file, no html entities or ampersands with numeric codes inside.
    On the other option, a conversion in the vars.php file: i was summing up the ISO to UTF codes, and codes look the same for many letters! For example, i check the capital O with acute accent, and the ISO table i am using shows #211, then i check a UTF table and it shows under the column "U-dec" (which seems to be the kind of values used for UTF in the vars.php file) the same #211.
    What am i missing here?

  7. anatman
    Member
    Posted 10 years ago #

    (i am about to bang my head onto the table in front of me. i can't make even the WP forum display a special character like a capital O with an acute sign. i have edited the above post 4 times trying that, to no avail, which makes me feel very, very stupid)

  8. Anonymous
    Unregistered
    Posted 10 years ago #

    That character ? Ó

  9. anatman
    Member
    Posted 10 years ago #

    Yes, that character. Man :-/
    Well, anyway - the conversion was sorted out by michel_v. Over WP's IRC channel he told me to open the SQL dump with an editor, and just save it in UTF8 encoding, and dump it in the database again. That did the trick.
    Beel, thank you one more time!

  10. dariottolo
    Member
    Posted 10 years ago #

    Hi anatman,
    I have the same problem that you had.
    Could you please post a step-by-step guide? If possible with the programs you used.
    Thanks a lot in advance.
    Dario

  11. joern
    Member
    Posted 10 years ago #

    If you have a linux system:
    iconv -f iso-8859-15 -t utf-8 < dbdump > dbdump.1
    iconv should be avialable on all glibc2 systems

  12. joern
    Member
    Posted 10 years ago #

    On Windows I down't know. On linux you could do this via ssh. As you can see I converted the hole db with no drawbacks at this time. Next problem was apache were you should have an:
    AddDefaultCharset utf-8
    in the configuration for your blog-directory.

  13. dariottolo
    Member
    Posted 10 years ago #

    Hi joern,
    thanks very much for your help!
    The support of my hoster did that for me :) Now everything looks great.
    There is still a problem with comments' email, which have weird characters... But that's not so important.
    Could you please let me know why should the default chartset be changed in Apache?
    Is it necessary?

  14. joern
    Member
    Posted 10 years ago #

    It was necessary because of mozilla/firefox recognized the page encoding as iso-8859-1 regardless whats in the content-type meta-tag. Mozilla makes this decision because of the Apache server-headers which it normaly set to iso-8859-1.

Topic Closed

This topic has been closed to new replies.

About this Topic

Tags

No tags yet.