WordPress.org

Forums

WP Web Scraper
problem with webpage with some "../whatever" and some "/whatever" links (3 posts)

  1. rl7greg
    Member
    Posted 2 years ago #

    I am trying to scrape a section of a website, lts say it's http://whatever.com/news/index.html

    some links are like this:
    <a href="../downloads(snip)

    but images are like this:
    <img src="/whatever.JPG"(snip)

    I tried basehref="http://whatever.com"
    (but then the ../something links don't work

    I tried basehref="http://whatever.com/news/"
    (but then the /something links don't work.

    How do I delete ".." in the URL's? I don't think I'm using replace_text / replace_with or clear_regex correctly, because they don't seem to work.

    Thanks if you can help

    http://wordpress.org/extend/plugins/wp-web-scrapper/

  2. rl7greg
    Member
    Posted 2 years ago #

    I'm trying this page exactly:
    [wpws url="http://www.sobhd.net/news/index.shtml" selector="div.news:eq(0)" cache="0" basehref="http://www.sobhd.net"]

    (like that the images work, but not links to documents that are originally "../whatever"

  3. rl7greg
    Member
    Posted 2 years ago #

    fixed it with:
    [wpws url="http://www.sobhd.net/news/index.shtml" selector="div.news:eq(0)" cache="0" basehref="http://www.sobhd.net/"]

    images links end up as sobhd.net//imagewhatever.jpg, but that still works.

    Would be nice to know really how to remove ".." from the url though

Topic Closed

This topic has been closed to new replies.

About this Plugin

  • WP Web Scraper
  • Frequently Asked Questions
  • Support Threads
  • Reviews

About this Topic

Tags

No tags yet.