Support » Fixing WordPress » the_content() is inserting weird 0xC2A0 characters

  • bennett

    (@bennett)


    I have a snippet of php code on the front page of http://www.peacefire.org that shows the content of the last 3 blog posts; the code just does the following inside The Loop:
    <?php the_post(); ?>
    <?php the_content(‘(More…)’); ?>

    You can see in the right-hand column on http://www.peacefire.org that it’s working, *almost*, except that the output of the_content() is inserting weird “” characters. (In case it doesn’t render correctly, that’s capital-A-with-a-pointy hat, a.k.a. character 0xC2.) This happens both in Netscape 7 and in IE 6.

    I used a network sniffer to see exactly what bytes the server was sending me. Apparently in a couple of places where I typed two space characters in a row when I was composing the blog posts, the calls to the_content() are rendering them with the characters 0xC2A0 stuck in there. Why in the world would it do that?

    Oddly, if you view the page at http://www.peacefire.org/blog/ , when the blog posts are displayed on that page, the 0xC2A0 characters are *also* sent over the http connection, but in that case the browser renders them as spaces, so it isn’t a problem.

    I can probably work around this temporarily by deleting any instances of double-space-characters in the blog posts and writing single spaces instead. But does anyone know why it’s happening and how to prevent it?

Viewing 6 replies - 1 through 6 (of 6 total)
  • yosemite

    (@yosemite)

    Thread Starter bennett

    (@bennett)

    When I use a telnet program to get “/index.php” from http://www.peacefire.org port 80, it is sent with the header:
    Content-Type: text/html; charset=ISO-8859-1

    On the other hand, when I get /blog/ , it is sent with the header:
    Content-Type: text/html; charset=UTF-8

    So, maybe that’s why the characters are displaying funny in the post content on http://www.peacefire.org. OK, but how do I fix it?

    I tried adding:
    <meta http-equiv=”Content-Type” content=”text/html; charset=UTF-8″ />

    to the <head> section of the http://www.peacefire.org front page, but that didn’t work, the A’s still showed up with their pointy hats.

    Meanwhile, even if there is a workaround, I still think it should be considered a bug that WordPress serves the content with those funny A-with-pointy-hat characters… a plane old 0x20 space would have been fine 🙂

    moshu

    (@moshu)

    Both pages = your index and the blog MUST be of the same encoding. Sending different encoding info confuses the browser. Your main page is still iso-8859-1 (Western,) while WP by default is always utf-8.

    Edit. Yosemite gave you the solution: fix your main page, add doctype, eliminate the errors… and after that post back if it doesn’t work.

    Thread Starter bennett

    (@bennett)

    OK I didn’t realize that Yosemite was saying that the doctype error *specifically* was something I should fix to solve the problem. (I thought he just meant it as the first in a list of syntax errors…)

    So, I added a doctype to the top of the page, but the A’s with pointy hats are still there:
    http://www.peacefire.org/

    Yes, the page validator still lists other errors, but I don’t know which errors are probably related to this problem. Rather than spending hours trying to fix all the syntax errors that have nothing to do with the problem (and which haven’t been causing problems in any major browser in the years that the site has been up), can you tell me if there’s something I should change that will make it work with the WordPress excerpts?

    Even http://www.peacefire.org/blog/ , whose content is generated entirely by WordPress, gives some syntax errors in the validator, so it’s not like every page on the WWW has to pass with 100% validity in order to work 🙂

    moshu

    (@moshu)

    I will not argue about the usefulness of validation – I am not a validation-freak.
    However, it helps many times to locate the source of the problems. (Like in your blog: ALL the errors are caused by your posts, not the WP scripts – just for the record.)

    Back to your problem: if you try to re-validate your main page you can read a very clear explanation for your troubles: despite your utf-8 in the ‘meta’ tag – your server is “forcing” an iso-8859-1 encoding on your page… so the WP’s correct utf-8 encoded special characters will get garbled. So, again, it is NOT a WP issue, it is about the encoding sent by the HTTP header on your site!

    Thread Starter bennett

    (@bennett)

    Well, also for the record, all of my blog posts which caused validation errors, were created by writing and editing them in the WordPress interface 🙂 — but even if those validation errors caused by WordPress, I still wouldn’t call it a bug as long as it works fine in the browser.

    Now I did notice the error on the validator page about the encoding, however it makes no sense since it says: “The character encoding specified in the HTTP header (iso-8859-1) is different from the value in the <meta> element (utf-8). I will use the value from the HTTP header (utf-8) for this validation.”

    Presumably the last sentence meant to say: “I will use the value from the HTTP header (iso-8859-1) for this validation.”

    Ah OK, so I crossed my fingers that this would be controlled somewhere in httpd.conf, and it was: I changed the line “AddDefaultCharset ISO-8859-1” to “AddDefaultCharset UTF-8”, and now http://www.peacefire.org/ loads fine, with no little pointy-hatted A’s in the blog excerpts.

    However, I would still argue that it’s a bug for WordPress to be inserting 0xC2A0 characters every time I type a double space, instead of using plain old 0x20 space characters, because there’s nothing lost by using 0x20 spaces, and 0xC2A0 spaces make the excerpts incompatible with pages that use ISO-8859-1 encoding. For a Web server to use ISO-8859-1 encoding is not in and of itself “wrong” so it would seem to make sense for WP to be compatible with those pages if there’s no sacrifice anywhere else.

Viewing 6 replies - 1 through 6 (of 6 total)
  • The topic ‘the_content() is inserting weird 0xC2A0 characters’ is closed to new replies.