Support » Fixing WordPress » remove HTML-markup from RSS: any suggestions?

  • i really don’t know if this a topic to deal with, but for my opinion it looks like garbage, if xml-content is enriched with html-markup.
    some posts and comments have really huge of it.

    is it possible, that this could affect syndicating and ranking results in searchengines?

    and if it is, is there any solution or do i have to fixe this alone?

    thx for any reply or help 🙂

Viewing 13 replies - 1 through 13 (of 13 total)
  • “i really don’t know if this a topic to deal with, but for my opinion it looks like garbage, if xml-content is enriched with html-markup.”

    I’ll just enjoy that comment for a moment.

    Ok, one option is to switch to summary display for syndication: Options > Reading, Syndication Feeds.

    Another is to modify your various feed templates to use the_content_rss() instead of the_content() for full text feeds. For example, in wp-rss2.php (RSS 2), look for this line:

    <content:encoded><![CDATA[<?php the_content('', 0, '') ?>]]></content:encoded>

    which I suggest changing to:

    <content:encoded><![CDATA[<?php the_content_rss('', false, '', 0, 2) ?>]]></content:encoded>

    Info on wp_content_rss() and its parameters:

    http://codex.wordpress.org/Template_Tags/the_content_rss

    Thread Starter infranic

    (@infranic)

    @kafkaesqui:
    it was my guess, that my question would intend some amusement for you out there, but as one can see, i’m a newbie to blogging.

    by attentively following your hints i discovered,

    1. that your suggested code-changes in the feed-templates work successful – many thx
    2. that (my) wp is giving me always the same feed-output – regardless of which post. info on wp_content_rss() says, that it’s encoding the actual post, but (for me) it does not. it always shows the rss for the whole blog. guess i’ll amuse you again… , but something went wrong.

    My enjoyment is not due to amusement, but agreement with your point. Sorry that wasn’t clear.

    2. What’s the version of WP you’re running right now? And do you have a link?

    Thread Starter infranic

    (@infranic)

    i’m running:

    wp 1.5.1,
    php 4.3.3,
    MySQL 4.1.12,
    apache 1.3.30

    my site isn’t online yet, at the moment i’m working here in the lan of our bureau.

    the header uses f.e. the following tag: <link rel="alternate" type="application/rss+xml" title="RSS 2.0" href="<?php bloginfo('rss2_url'); ?>" />. should there be some other parameters been given with it?

    may be i have to creep into the php more than that i intrinsic wanted. ask me anything about perl or javascript, but i am at daggers dawn with php from it’s beginning :)…

    glad to meet someone, who cares about poetry!

    <?php bloginfo('rss2_url'); ?> only generates the RSS2 link to the blog as a whole. It’s not post-aware. Individual posts will typically provide a link using the comments_rss_link() . There is no single post rss per se.

    Thread Starter infranic

    (@infranic)

    okay, i got it. testing around with the comments-tags will be useful. unfortunately i’ve made the mistake to customize the layout and structure of my site to early. a real crashcourse in a php. tweak, tweak…

    may be, that i will post another question these days in here.

    thx a lot, be blessed.

    HTML is the format you’re writing your content in. You need the html to be in your rss feed so paragraphs break properly, lists show as lists, images show up, etc in aggregators. This will not affect search engines, but removing html tags might make your syndicated feed readers miserable.

    It’s a shame I disagree with you on that Firas, it really is.

    Kafkaesqui, I don’t quite understand what there is to disagree about–how are you going to link to things in RSS item content sans HTML?

    I don’t link to things in RSS.

    Ahh, yeah. It depends on what one things RSS should do (ie., what sort of information it should contain), I guess.

    And that quite simply is the key.

    Thread Starter infranic

    (@infranic)

    Ooops, may I hand you a cleaver.

    @firas:

    may be, that this will enlighten the understanding of feeds and its history:

    RDF = Resource Description Framework

    RSS = Rich Site Summary (later) Realy Simple Syndication

    If one keeps in mind, that any pattern-matching searchtask in the semantic web has to chow rich markup with inline-styles & javascripts etc… , he might understand, why it can be useful, to straighten out feeds in a way, that easily can be accessed for plausible matches.

    For my opinion RSS can be used in many different ways. I will use it for additional reminding of siteupdates, others may use it for blogging. But in any case the codex intends machine readable output, – and not any string (with or without markup) will pass the ball to this. The base always is wellformed xml and qualified dtd’s.

    cheers

Viewing 13 replies - 1 through 13 (of 13 total)
  • The topic ‘remove HTML-markup from RSS: any suggestions?’ is closed to new replies.