WordPress.org

Ready to get started?Download WordPress

Forums

WP Web Scraper
Output to html not working ? (5 posts)

  1. jayseventwo
    Member
    Posted 1 year ago #

    Hi, i am trying to grab a table from one part of my site to display on my front page, and it is displaying the content fine but not the actual table surrounding it.

    shortcode:

    [wpws url="my_url" selector="#1st" output="html"]

    and table on page is:

    <table id="1st">
    <tbody>
    <tr>
    <th>Date</th>
    <th>Opponent</th>
    <th>Location</th>
    </tr>
    <tr>
    <td>06/04/13</td>
    <td>Port</td>
    <td>Home</td>
    </tr>
    </tbody>
    </table>

    Am i doing something wrong? I need the table so i can format the content.

    Cheers

    http://wordpress.org/extend/plugins/wp-web-scrapper/

  2. uhi888
    Member
    Posted 1 year ago #

    I've got the same problem ...

  3. alchymyth
    The Sweeper & Moderator
    Posted 1 year ago #

    @uhi888

    please post more dateils;

    as your problem is probably not exactly the same, please start your own topic and include a link to your site.

  4. adamf321
    Member
    Posted 1 year ago #

    Me too. I'm using WP Scraper in a widget with the following shortcode:
    [wpws url="http://bbc.co.uk" selector="#business_marketData_items" user_agent="Bot at capman-group.com" on_error="error_show" output="html" striptags="<a>"]

    The only HTML tags left in with output="html" are <span>'s, all the others a stripped out. If I change it to output="text" it strips out the <span>'s too.

    Debug info below. My site isn't live yet so I have nothing to show (I'm developing locally on my machine).

    <!--
     Start of web scrap (created by wp-web-scraper)
     Source URL: http://bbc.co.uk
     Selector: #business_marketData_items
     Xpath:
     Delivered thru: Cache
     WPWS options: Array
    (
        [postargs] =>
        [cache] => 60
        [user_agent] => Bot at capman-group.com
        [timeout] => 2
        [on_error] => error_show
        [output] => html
        [clear_regex] =>
        [clear_selector] =>
        [replace_regex] =>
        [replace_selector] =>
        [replace_with] =>
        [replace_selector_with] =>
        [basehref] =>
        [striptags] => <a>
        [removetags] =>
        [callback] =>
        [debug] => 1
        [htmldecode] =>
        [urldecode] => 1
        [xpathdecode] =>
        [request_mt] => 1353787384.3393
    )
    -->
  5. adamf321
    Member
    Posted 1 year ago #

    I've figured out the "issue"... I was trying to scrape a table by using the css id of the the <table> tag. The scraper pulls out all the html below this, but not the <table> tag itself. This meant when I looked in Chrome's Inspect Element console it showed no html, as the html was badly formed and couldn't be parsed. I saw that it all looked ok when I looked at the source (Ctrl-U in Chrome).

    To fix it I used a callback function to replace the <table> tags:

    function mymodule_add_table_tags ($scrap) {
    	return '<table>'.$scrap.'</table>';
    }

    And change the shortcode to:
    [wpws url="http://bbc.co.uk" selector="#business_marketData_items" user_agent="Bot at capman-group.com" on_error="error_show" output="html" striptags="<a>" callback="mymodule_add_table_tags "]

Topic Closed

This topic has been closed to new replies.

About this Plugin

About this Topic