An easy to implement web scraper for WordPress. Display realtime data from any websites directly into your posts, pages or sidebar.
Web scraping (or Web harvesting, Web data extraction) is a computer software technique of extracting information from websites. Web scraping focuses more on the transformation of unstructured Web content, typically in HTML format, into structured data that can be formatted and displayed or stored and analyzed. Web scraping is also related to Web automation, which simulates human Web browsing using computer software. Exemplary uses of Web scraping include online price comparison, weather data monitoring, market data tracking, Web content mashup and Web data integration.
Use the 'Add new web scrap' button to add a web scrap to your post or page. You can also use the template tag or shortcode detailed below.
WP Web Scraper can be used through a template tag (for direct integration in your theme) or shortcode (for posts, pages or sidebar) for scraping and displaying web content. Here's the actual usage detail:
For use within themes: <?php echo wpws_get_content($url, $selector, $xpath, $wpwsopt)?>
Example usage in theme: <?php echo wpws_get_content('http://google.com','title','','user_agent=My Bot&on_error=error_show&')?> (Display the title tag of google's home page, using My Bot as a user agent)
For use directly in posts, pages or sidebar (text widget): [wpws url="" selector=""]
Example usage as a shortcode: [wpws url="http://google.com" selector="title" user_agent="My Bot" on_error="error_show"] (Display the title tag of google's home page, using My Bot as a user agent)
For usage of other advanced parameters refer the Usage Manual
Further details about selector syntax in Selectors
Yes you can. However, you should consider the copyright of the content owner. Its best to at least attribute the content owner by a linkback or better take a written permission. Apart from rights, scraping in general is a very resource intensive task. It will exhaust the bandwidth of your host as well as the host of of the content owner. Best is not to overdo it. Ideally find single pages with enough content to create your your meshup.
Here are some tips to help you optimize the usage:
For scraping, the plugin primarily uses WP_HTTP classes. For caching it uses the Transients API. For parsing htm using CSS style selectors the plugin uses phpQuery - a server-side, chainable, CSS3 selector driven Document Object Model (DOM) API based on jQuery JavaScript Library and for xpath parsing it uses JS_Extractor.
Requires: 2.8 or higher
Compatible up to: 3.1.4
Last Updated: 2012-1-27
Downloads: 18,203




