Support » Plugin: Broken Link Checker » Lots of valid links showing as broken

  • Resolved Tina

    (@sunflowermom)


    This has always happened from time to time…but lately, maybe the past couple months, this has been happening a LOT.

    Links that are very clearly not broken that are being tagged as broken and I don’t have any idea why. Lots of 403 forbidden. But other messages as well.

    I’ll pay more attention to them if I know what I’m looking for. Why are so many being inappropriately tagged?

Viewing 10 replies - 1 through 10 (of 10 total)
  • Plugin Support Patrick – WPMU DEV Support

    (@wpmudevsupport12)

    Hi @sunflowermom

    Sorry to hear you are having this issue.

    The Broken Link checker will use your server IP to make a request to the link and get the result, depending on the site that it makes the request it is possible the IP is blocked or it is not allowing the request, could you share any of the reported link and we can take a look?

    Best Regards
    Patrick Freitas

    Thread Starter Tina

    (@sunflowermom)

    I have a list of links that I’ve had to click “dismiss” on due to this problem.

    Here are the latest ones that come back as 403 forbidden but the links work fine:
    https://www.thriftbooks.com/
    https://www.cagreatamerica.com/
    https://www.knotts.com/
    https://www.knotts.com/groups/student-and-youth/adventures-in-education

    This video embeds the link on this post for some reason…but it’s a Kickstarter link that comes back as 403…

    This one shows a 456 Unknown Error:
    https://www.walmart.com/ip/Super-Sticky-Laptop-Note-Dispenser/23340736

    • This reply was modified 1 year, 3 months ago by Tina.
    • This reply was modified 1 year, 3 months ago by Tina.
    Plugin Support Patrick – WPMU DEV Support

    (@wpmudevsupport12)

    Hi @sunflowermom

    Thank you.

    I could replicate this behaviour using the Broken Link Checker plugin, I can see the URLs are not blocking the cURL request

    ~$ Curl -IL -A "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36" https://www.cagreatamerica.com/
    HTTP/1.1 200 
    Date: Thu, 30 Jun 2022 15:24:01 GMT
    Content-Type: text/html;charset=UTF-8
    Connection: keep-alive

    But for some reason, it blocks the Plugin request.

    It could be related to server IP since it uses the website/server IP to make the request, I pinged our developers to take a closer look, we will keep you posted.

    Best Regards
    Patrick Freitas

    Thread Starter Tina

    (@sunflowermom)

    Here’s another: https://malwaretips.com/blogs/live-security-platinum-virus/

    I just rechecked it and now it says 200 OK…but it came up in my list of broken links and this particular link has been coming up repeatedly over the last few months.

    Plugin Support Adam – WPMU DEV Support

    (@wpmudev-support8)

    Hi @sunflowermom

    I hope you’re well today!

    Issues like this are usually happening for URLs that point to services that are using CloudFlare or similar CDN and/or some other kind of proxy/firewall. In many cases these services do “recognize” those requests (requests that plugin makes in order to check URL) as “automated” and consider them “unwanted bots”.

    If that happens, then plugin may report link as broken (because it really gets e.g. 403 Forbidden response) even though it works fine when you visit it directly.

    In fact, even if you are checking multiple different target URLs with plugin right from the same sit/server, then e.g. CloudFlare or similar service will still see “massive” automated traffic from your IP – hence the temporary “lockout” resulting in broken link report. As I mentioned, this may be CloudFlare but it may as well be other proxy/firewall solution at the target site.

    But if links like the one you shared in post above are also sometimes detected as 200 OK then what may help would be to decrease frequency of link re-checking. There’s “Check each link” option on “Settings -> Link Checker” page where you set how often (in hours) each link should be rechecked. It may be worth setting this value to a much higher number than it currently is so it would cause links to be rechecked less frequently. It may help with at least some of such links.

    Kind regards,
    Adam

    Thread Starter Tina

    (@sunflowermom)

    So it looks like one of my sites is currently set to check every 48 hours…another every 72 hours…should I set it to once a week instead?

    Plugin Support Patrick – WPMU DEV Support

    (@wpmudevsupport12)

    Hi @sunflowermom

    When we reduce scan time it will help to prevent the source server to block your request, but it only helps and won’t remove all false positives.

    we also escalated this to our developers to verify why it is happening on your site, the Broken link checker team is working on a new engine, we don’t have an estimated time yet but it will improve a lot the scan process.

    Best Regards
    Patrick Freitas

    Thread Starter Tina

    (@sunflowermom)

    Ok thank you.

    Plugin Support Patrick – WPMU DEV Support

    (@wpmudevsupport12)

    Hi @sunflowermom

    I hope you are doing well

    We investigated the links and the issue is that the site returns a Captcha screen to the request and this is the problem:

    <!DOCTYPE html>
    <html lang="en">
    <head runat="server">
    <meta name="viewport" content="initial-scale=1.0, width=device-width, maximum-scale=1.0" />
    <meta http-equiv="Cache-Control" content="no-cache, no-store, must-revalidate" />
    <meta http-equiv="Pragma" content="no-cache" />
    <meta http-equiv="Expires" content="0" />
    <title>Captcha Page</title>
    <link rel="stylesheet" href="https://assets.ntcacdn.net/Mitigations/captcha-1.0.0.css">
    <script src="https://www.recaptcha.net/recaptcha/api.js" async defer></script>
    </head>
    <body>
    <div class="wrapper">
    <header>
    <h1>Help us verify real visitors</h1>
    </header>
    <p>Please complete to continue</p>
    <form id="frmCaptcha" action="" method="POST">
    <div class="g-recaptcha" data-sitekey="6LfuYNwbAAAAAB5s2yUHe7IO1bjlLWCbijXzmCJ_" data-callback="showButton"></div>
    <br />
    <input type="submit" value="Submit" class="btn btnHidden">
    <input type="hidden" maxlength="40" id="hitid" name="hitid" value="506836e3-4f4e-4c23-af86-f9766ee17900">
    </form>
    </div>
    <script src="https://assets.ntcacdn.net/Mitigations/fetch-polyfill-3.6.2.js"></script>
    <script src="https://assets.ntcacdn.net/Mitigations/submit-captcha-2.0.2.js"></script>
    </body>
    </html>

    I am afraid the plugin won’t be able to process those images, but we are working on a new Engine that will be an improved experience and I hope will prevent similar issues.

    Best Regards
    Patrick Freitas

    Plugin Support Nithin – WPMU DEV Support

    (@wpmudevsupport11)

    Hi @sunflowermom,

    Since we haven’t heard from you for a while. I’ll mark this thread as resolved for now. Please feel free to re-open the thread if you need further assistance.

    Best Regards
    Nithin

Viewing 10 replies - 1 through 10 (of 10 total)
  • The topic ‘Lots of valid links showing as broken’ is closed to new replies.