• Hiya. Relatively new to WordPress. I noticed an issue with my RSS feed — when I have set the blog’s description to include a special character, it fails to re-encode the & as &, which results in the XML being invalid.

    As a specific example, I’ve currently got the blog — behttp://ziggurat.org/wordpress/ — set up with an em dash, written as — in it. The RSS2.0 feed displays it unchanged, where I expect it to have become —. E.g.:

    <channel>
    <title>the everyday adventures of sabrina</title>
    <link>http://ziggurat.org/wordpress</link>
    <description>when you cut the lights out, think of me&mdash;</description>

    versus (a good feed off my current blog software):

    <channel>
    <title>the everyday adventures of sabrina</title>
    <link>http://ziggurat.org/blox</link>
    <description>my insecurities could eat me alive&amp;mdash;</description>

    I’m surprised to see this come up, as I’d expect anyone who writes in another language to have seen it before me since I’m late to the whole WordPress game — so I kinda assume I’ve managed to do something wrong (rather than this being an actual bug), though I don’t know how, since I have only applied one plugin (timezone) and this occurs with the default theme as well (though it does seem like it should not be affected by themes).

    I was looking over the code and trying to see where this is handled, but I’m new to PHP and not sure I’m reading things correctly. Within normalize() (in rss-functions.php) I tried doing something like a str_replace('&([A-Za-z0-9]{1.8};)', '&amp;$1', $foo) on $this->channel['tagline'] but really that was just a stab in the dark (and while I didn’t get any PHP errors, it also didn’t help, so I backed out my changes.)

    So really, if anyone has any suggestions either for directly fixing this — assuming that I’ve done something wrong — or if this might actually need to have something changed in code, pointers for quickly learning about how PHP does regexes or where I can learn about the rss-specific functions in WordPress (specifically, stuff like what they mean by “description” versus “tagline” because I couldn’t quite tell if “description” was being used literally for the content of the XML channel description field, or to describe what fields were, etc) would be appreciated. I tried searching the help without much luck but maybe I was just having a bad search terms day. 🙂

    (Oh, there was one apparent typo in the normalize() function I changed that I didn’t back out, now that I think of it — in the $this->is_atom() bit, the next line had a reference to $this->channel['descripton'] which I changed to ‘description’ since there didn’t seem to be any other references to ‘descripton’.)

    Thanks! (Ooh, I hope all the stuff I crammed in here renders the way I want it to when it posts… 🙂

Viewing 1 replies (of 1 total)
  • Thread Starter sldownard

    (@sldownard)

    Okay, I believe this is fixed — via an ugly hack that is probably not doing the right thing, but I’m okay with that — but I’d still welcome feedback from someone who, like, actually knows PHP. 🙂

    I changed it by modifying, in feed-functions.php, get_bloginfo_rss(), to do a manual str_replace specifically for ampersands:

    function get_bloginfo_rss($show = '') {
    $info = strip_tags(get_bloginfo($show));
    $info = str_replace('&', '&amp;', $info);
    return convert_chars($info);
    }

    I believe this is happening because convert_chars() only appears to replace bare ampersands; if they are followed by # or a semi-colon-terminated string they are ignored.

    So. Ugly, but it seems to work, so I’m going to go with it for now.

Viewing 1 replies (of 1 total)
  • The topic ‘HTML entities in blog descriptions’ is closed to new replies.