WordPress.org

Ready to get started?Download WordPress

Forums

Parteibuch Aggregator
[resolved] reparing broken HTML in incomming feeds (1 post)

  1. rcain
    Member
    Posted 3 years ago #

    Came accross horrible problem recently when an incomming rss feed broke my page html completely. The problem was caused by the following truncated/broken htmlencoded snippet within the feed::

    ...<description>...blah blah blah... has said. <TABLE style="FLOAT: right; MARGIN: 0px 5</description>...

    note: no end quote and no end tag. when bdprss parses this in, it unfortunately conflicts with it's own use of htmlencoding, converts it back to proper <TABLE ... tries it's best to rebuild proper html syntax. Does pretty well at terminating the table tag it's self, but unfortunately doesn't even attempt to close the missing quote on the inline style attribute. Which makes matters a whole lot worse - ie. completely breaks the page.

    Here's my fix::

    file: bdp-rssaggregator.php, function packageItemText (around line 616).

    after:

    // delete unrequired tags
    			$string = mb_eregi_replace("<[a-zA-Z]+[^>]*>", 	'',	$string);
    			$string = mb_eregi_replace("</[a-zA-Z]+[^>]*>",	'',	$string);

    add:

    $string = mb_eregi_replace("<[a-zA-Z]+[^>]*$", '', $string);

    then, at bottom of that same function, around line 737 ish (// tighten up the HTML)::

    $ret = mb_eregi_replace("(<[a-zA-Z]+[^\>]*>) (<[a-zA-Z]+[^\>]*>)", "\\1\\2", $ret);
    			}
    			return ($ret);
    		}

    insert the following two lines to the // tighten up the HTML block:

    $ret = mb_eregi_replace("(<[a-zA-Z]+\s+[a-zA-Z]+=)\"([^\>\"]*)>", "\\1\"\\2\">", $ret);
    				$ret = mb_eregi_replace("(<[a-zA-Z]+\s+[a-zA-Z]+=)'([^\>']*)>", "\\1'\\2'>", $ret);

    It will now be a lot more forgiving of broken html(-like) constructs in incomming feeds.

Topic Closed

This topic has been closed to new replies.

About this Plugin

About this Topic