Incomplete Feed Import

Resolved Der-Bank-Blog
(@der-bank-blog)

7 years, 3 months ago

Hi

thanks for the great plugin. Unfortunatly the Import of the XML-Feed URL is incomplete in my case.
The source contains links, which are not imported.

Instead I find the following code in the import:
<!–#loopitem#–><!–#/loopitem#–><!–#loopitem#–>

Any Ideas htat might help?

Cheers from Germany

Hansjörg

Viewing 4 replies - 1 through 4 (of 4 total)

Plugin Contributor cyberseo
(@cyberseo)

7 years, 3 months ago

The only way to find out is to look at the RSS feed URL. Please post it here and I’ll give you not just an abstract idea, but the exact answer 🙂

Thread Starter Der-Bank-Blog
(@der-bank-blog)

7 years, 3 months ago

Thanks for the quick answer. Here is the URL to the feed
http://s3-eu-west-1.amazonaws.com/files.crsend.com/60000/60683/rss/mailings.xml

and here an example of what Comes out with your tool:
https://www.catherine-leichsenring.de/jobs/weihnachtspause-deutsche-bank-einigt-sich-mit-us-justiz-staat-stuetzt-monte-dei-paschi-loretta-lynch-martin-neff/

Cheers

Hansjörg
Plugin Contributor cyberseo
(@cyberseo)

7 years, 3 months ago
Here is an example of a typical item in your feed:
```
    <item id="6725565">
      <guid>6725565</guid>
      <author>Bankenbrief <no-reply@bankenverband-service.de></author>
      <title>Sparda-Bank Hannover schränkt Münzannahme drastisch ein / Bank J. Safra Sarasin / Brexit / Monte dei Paschi / David Folkerts-Landau / Elisha Wiesel / Bullshit-Bingo zum Arbeitsstart</title>
      <link>http://s3-eu-west-1.amazonaws.com/files.crsend.com/60000/60683/rss/media/6725565.htm</link>
      <description><a href='http://s3-eu-west-1.amazonaws.com/files.crsend.com/60000/60683/rss/media/6725565.htm'>HTML</a> <a href='http://s3-eu-west-1.amazonaws.com/files.crsend.com/60000/60683/rss/media/6725565.txt'>TXT</a></description>
      <pubDate>Tue, 10 Jan 2017 16:05:34 +0100</pubDate>
      <category id="101969">Bankenbrief</category>
      <enclosure type="image/jpeg" url="http://s3-eu-west-1.amazonaws.com/files.crsend.com/60000/60683/rss/media/6725565.jpg"/>
    </item>
```
The only HTML content it does contain is the following (two links separated with a space):

<a href='http://s3-eu-west-1.amazonaws.com/files.crsend.com/60000/60683/rss/media/6725565.htm'>HTML</a> <a href='http://s3-eu-west-1.amazonaws.com/files.crsend.com/60000/60683/rss/media/6725565.txt'>TXT</a>

So it seems you are set CyberSyn to extract the full articles. In this mode the plugin does not use the content from the feed (the one I’ve pasted above). It uses the post link to source (in this example it’s http://s3-eu-west-1.amazonaws.com/files.crsend.com/60000/60683/rss/media/6725565.htm) and trying to analyze the HTML page located at this URL. If you open the URL in the browser and look at its source (right mouse click then “View Page Source”), you will see it’s a very complicated HTML code (with “<!–#/loopitem#–>” blocks, Javascript, CSS etc).

The AI is not invented yet and the full text extraction script is not a human, but it does its best and try to analyze that jungle of code and extract the actual article of it. Usually it succeeds, but sometimes to fails. Once again, it’s just a script…

The only solution I can suggest for you is to use the professional CyberSEO plugin which sufficiently extends the capabilities of CyberSyn. For example it allows you to write your own parser, which will be specific for this particular HTML structure and use it to process that particular feed strictly by your own rules. I know, that not everybody knows PHP, so I do provide a custom parser development service to my customers for as low as $10-$15 depending on its complicity.
Thread Starter Der-Bank-Blog
(@der-bank-blog)

7 years, 3 months ago

OK, I understand. Thanks for your Kind help

Viewing 4 replies - 1 through 4 (of 4 total)

The topic ‘Incomplete Feed Import’ is closed to new replies.