Support » Plugin: Broken Link Checker » Lots of valid links showing as broken
Lots of valid links showing as broken
-
This has always happened from time to time…but lately, maybe the past couple months, this has been happening a LOT.
Links that are very clearly not broken that are being tagged as broken and I don’t have any idea why. Lots of 403 forbidden. But other messages as well.
I’ll pay more attention to them if I know what I’m looking for. Why are so many being inappropriately tagged?
-
Sorry to hear you are having this issue.
The Broken Link checker will use your server IP to make a request to the link and get the result, depending on the site that it makes the request it is possible the IP is blocked or it is not allowing the request, could you share any of the reported link and we can take a look?
Best Regards
Patrick FreitasI have a list of links that I’ve had to click “dismiss” on due to this problem.
Here are the latest ones that come back as 403 forbidden but the links work fine:
https://www.thriftbooks.com/
https://www.cagreatamerica.com/
https://www.knotts.com/
https://www.knotts.com/groups/student-and-youth/adventures-in-education
This video embeds the link on this post for some reason…but it’s a Kickstarter link that comes back as 403…This one shows a 456 Unknown Error:
https://www.walmart.com/ip/Super-Sticky-Laptop-Note-Dispenser/23340736Thank you.
I could replicate this behaviour using the Broken Link Checker plugin, I can see the URLs are not blocking the cURL request
~$ Curl -IL -A "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36" https://www.cagreatamerica.com/ HTTP/1.1 200 Date: Thu, 30 Jun 2022 15:24:01 GMT Content-Type: text/html;charset=UTF-8 Connection: keep-alive
But for some reason, it blocks the Plugin request.
It could be related to server IP since it uses the website/server IP to make the request, I pinged our developers to take a closer look, we will keep you posted.
Best Regards
Patrick FreitasHere’s another: https://malwaretips.com/blogs/live-security-platinum-virus/
I just rechecked it and now it says 200 OK…but it came up in my list of broken links and this particular link has been coming up repeatedly over the last few months.
I hope you’re well today!
Issues like this are usually happening for URLs that point to services that are using CloudFlare or similar CDN and/or some other kind of proxy/firewall. In many cases these services do “recognize” those requests (requests that plugin makes in order to check URL) as “automated” and consider them “unwanted bots”.
If that happens, then plugin may report link as broken (because it really gets e.g. 403 Forbidden response) even though it works fine when you visit it directly.
In fact, even if you are checking multiple different target URLs with plugin right from the same sit/server, then e.g. CloudFlare or similar service will still see “massive” automated traffic from your IP – hence the temporary “lockout” resulting in broken link report. As I mentioned, this may be CloudFlare but it may as well be other proxy/firewall solution at the target site.
But if links like the one you shared in post above are also sometimes detected as 200 OK then what may help would be to decrease frequency of link re-checking. There’s “Check each link” option on “Settings -> Link Checker” page where you set how often (in hours) each link should be rechecked. It may be worth setting this value to a much higher number than it currently is so it would cause links to be rechecked less frequently. It may help with at least some of such links.
Kind regards,
AdamSo it looks like one of my sites is currently set to check every 48 hours…another every 72 hours…should I set it to once a week instead?
When we reduce scan time it will help to prevent the source server to block your request, but it only helps and won’t remove all false positives.
we also escalated this to our developers to verify why it is happening on your site, the Broken link checker team is working on a new engine, we don’t have an estimated time yet but it will improve a lot the scan process.
Best Regards
Patrick FreitasOk thank you.
I hope you are doing well
We investigated the links and the issue is that the site returns a Captcha screen to the request and this is the problem:
<!DOCTYPE html> <html lang="en"> <head runat="server"> <meta name="viewport" content="initial-scale=1.0, width=device-width, maximum-scale=1.0" /> <meta http-equiv="Cache-Control" content="no-cache, no-store, must-revalidate" /> <meta http-equiv="Pragma" content="no-cache" /> <meta http-equiv="Expires" content="0" /> <title>Captcha Page</title> <link rel="stylesheet" href="https://assets.ntcacdn.net/Mitigations/captcha-1.0.0.css"> <script src="https://www.recaptcha.net/recaptcha/api.js" async defer></script> </head> <body> <div class="wrapper"> <header> <h1>Help us verify real visitors</h1> </header> <p>Please complete to continue</p> <form id="frmCaptcha" action="" method="POST"> <div class="g-recaptcha" data-sitekey="6LfuYNwbAAAAAB5s2yUHe7IO1bjlLWCbijXzmCJ_" data-callback="showButton"></div> <br /> <input type="submit" value="Submit" class="btn btnHidden"> <input type="hidden" maxlength="40" id="hitid" name="hitid" value="506836e3-4f4e-4c23-af86-f9766ee17900"> </form> </div> <script src="https://assets.ntcacdn.net/Mitigations/fetch-polyfill-3.6.2.js"></script> <script src="https://assets.ntcacdn.net/Mitigations/submit-captcha-2.0.2.js"></script> </body> </html>
I am afraid the plugin won’t be able to process those images, but we are working on a new Engine that will be an improved experience and I hope will prevent similar issues.
Best Regards
Patrick FreitasHi @sunflowermom,
Since we haven’t heard from you for a while. I’ll mark this thread as resolved for now. Please feel free to re-open the thread if you need further assistance.
Best Regards
Nithin
- The topic ‘Lots of valid links showing as broken’ is closed to new replies.