Forums

HTML entities in blog descriptions (2 posts)

  1. sldownard
    Member
    Posted 5 years ago #

    Hiya. Relatively new to WordPress. I noticed an issue with my RSS feed -- when I have set the blog's description to include a special character, it fails to re-encode the & as &, which results in the XML being invalid.

    As a specific example, I've currently got the blog -- behttp://ziggurat.org/wordpress/ -- set up with an em dash, written as — in it. The RSS2.0 feed displays it unchanged, where I expect it to have become —. E.g.:

    <channel>
    <title>the everyday adventures of sabrina</title>
    <link>http://ziggurat.org/wordpress</link>
    <description>when you cut the lights out, think of me&mdash;</description>

    versus (a good feed off my current blog software):

    <channel>
    <title>the everyday adventures of sabrina</title>
    <link>http://ziggurat.org/blox</link>
    <description>my insecurities could eat me alive&amp;mdash;</description>

    I'm surprised to see this come up, as I'd expect anyone who writes in another language to have seen it before me since I'm late to the whole WordPress game -- so I kinda assume I've managed to do something wrong (rather than this being an actual bug), though I don't know how, since I have only applied one plugin (timezone) and this occurs with the default theme as well (though it does seem like it should not be affected by themes).

    I was looking over the code and trying to see where this is handled, but I'm new to PHP and not sure I'm reading things correctly. Within normalize() (in rss-functions.php) I tried doing something like a str_replace('&([A-Za-z0-9]{1.8};)', '&amp;$1', $foo) on $this->channel['tagline'] but really that was just a stab in the dark (and while I didn't get any PHP errors, it also didn't help, so I backed out my changes.)

    So really, if anyone has any suggestions either for directly fixing this -- assuming that I've done something wrong -- or if this might actually need to have something changed in code, pointers for quickly learning about how PHP does regexes or where I can learn about the rss-specific functions in WordPress (specifically, stuff like what they mean by "description" versus "tagline" because I couldn't quite tell if "description" was being used literally for the content of the XML channel description field, or to describe what fields were, etc) would be appreciated. I tried searching the help without much luck but maybe I was just having a bad search terms day. :)

    (Oh, there was one apparent typo in the normalize() function I changed that I didn't back out, now that I think of it -- in the $this->is_atom() bit, the next line had a reference to $this->channel['descripton'] which I changed to 'description' since there didn't seem to be any other references to 'descripton'.)

    Thanks! (Ooh, I hope all the stuff I crammed in here renders the way I want it to when it posts... :)

  2. sldownard
    Member
    Posted 5 years ago #

    Okay, I believe this is fixed -- via an ugly hack that is probably not doing the right thing, but I'm okay with that -- but I'd still welcome feedback from someone who, like, actually knows PHP. :-)

    I changed it by modifying, in feed-functions.php, get_bloginfo_rss(), to do a manual str_replace specifically for ampersands:

    function get_bloginfo_rss($show = '') {
    $info = strip_tags(get_bloginfo($show));
    $info = str_replace('&', '&amp;', $info);
    return convert_chars($info);
    }

    I believe this is happening because convert_chars() only appears to replace bare ampersands; if they are followed by # or a semi-colon-terminated string they are ignored.

    So. Ugly, but it seems to work, so I'm going to go with it for now.

Topic Closed

This topic has been closed to new replies.

About this Topic