Support » Plugin: WP Web Scraper » problem with webpage with some "../whatever" and some "/whatever" links

  • I am trying to scrape a section of a website, lts say it’s http://whatever.com/news/index.html

    some links are like this:
    <a href=”../downloads(snip)

    but images are like this:
    <img src=”/whatever.JPG”(snip)

    I tried basehref=”http://whatever.com”
    (but then the ../something links don’t work

    I tried basehref=”http://whatever.com/news/”
    (but then the /something links don’t work.

    How do I delete “..” in the URL’s? I don’t think I’m using replace_text / replace_with or clear_regex correctly, because they don’t seem to work.

    Thanks if you can help

    http://wordpress.org/extend/plugins/wp-web-scrapper/

Viewing 2 replies - 1 through 2 (of 2 total)
  • I’m trying this page exactly:
    [wpws url=”http://www.sobhd.net/news/index.shtml” selector=”div.news:eq(0)” cache=”0″ basehref=”http://www.sobhd.net”]

    (like that the images work, but not links to documents that are originally “../whatever”

    fixed it with:
    [wpws url=”http://www.sobhd.net/news/index.shtml” selector=”div.news:eq(0)” cache=”0″ basehref=”http://www.sobhd.net/”]

    images links end up as sobhd.net//imagewhatever.jpg, but that still works.

    Would be nice to know really how to remove “..” from the url though

Viewing 2 replies - 1 through 2 (of 2 total)
  • The topic ‘problem with webpage with some "../whatever" and some "/whatever" links’ is closed to new replies.