WordPress.org

Support

Support » Plugins and Hacks » [Resolved] Linkchecker and other legit bots are broken

[Resolved] Linkchecker and other legit bots are broken

  • Bug 1: Broken link checker is one of the most commonly used plugins but certain queries, in particular checking images on my site, are blocked with a 403 error (see below) I did not turn on hotlinking of images.

    I’m using BPS 47.8 and WP 3.5.1

    Bug 2: I’m also having the same problem this guy is having with the facebook block, that is still unsolved:
    https://wordpress.org/support/topic/403-errors-2?replies=21

    Bug 3: In the new line DirectoryIndex index.php index.html /index.php
    it took a while but it looks like “/index.php” was really messing up my installation where I have enabled apache directory listing on certain directories. Specifically, it was causing a 403 error but commenting out that line fixed the problem. Now anytime this plugin is updated I will have to comment out that line again.

    Thank you very much for your efforts!

    (log anonymized- note that the link checker impersonates IE)
    >>>>>>>>>>> 403 Error Logged – February 1, 2013 – 4:24 pm <<<<<<<<<<<
    REMOTE_ADDR: 123.123.123.123
    Host Name: 123.123.123.123
    HTTP_CLIENT_IP:
    HTTP_FORWARDED:
    HTTP_X_FORWARDED_FOR:
    HTTP_X_CLUSTER_CLIENT_IP:
    REQUEST_METHOD: GET
    HTTP_REFERER: 123.123.123.123
    REQUEST_URI: /wp-content/uploads/2012/09/my-image.png
    QUERY_STRING:
    HTTP_USER_AGENT: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)

Viewing 15 replies - 16 through 30 (of 57 total)
  • Plugin Author AITpro

    @aitpro

    Nice on the Broken Link Checker fix!

    If this problem is related to cache then it is definitely not related to any WordPress caching plugins and is something else, which I have not figured out, if it has to do with cache in any way. I think that it is somehow related to the Header, but not cache.

    Now i am even more confused by doing several tests with the facebook Developers Debugger. I get a 404 not a 403 error and facebook is successfully seeing and retrieving the image??? Total contradiction – either the URL IS found or NOT found – it can’t be both??? Something else is happening here that i cannot see because i do not have access to the facebook externalhit_uatext.php file/script.

    So since the image file is found = 200 OK
    Since facebook is retrieving the image files, but saying it is not able to retrieve them then I am stumped and can only guess that what is happening is that the Header that is returned is not being interpreted correctly.
    This 404 error really confuses me and I guess whatever else the externalhit_uatext.php file/script is trying to retrieve is where that 404 error is coming from. it is not the image file itself???

    >>>>>>>>>>> 404 Error Logged [02/07/2013 10:28 PM] <<<<<<<<<<<
    REMOTE_ADDR: 173.252.110.117
    Host Name: 173.252.110.117
    HTTP_CLIENT_IP:
    HTTP_FORWARDED:
    HTTP_X_FORWARDED_FOR:
    HTTP_X_CLUSTER_CLIENT_IP:
    REQUEST_METHOD: GET
    HTTP_REFERER:
    REQUEST_URI: /aitpro-blog/wp-content/themes/aitpro/images/bps-45-website-protection.png
    QUERY_STRING:
    HTTP_USER_AGENT: facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)
    Plugin Author AITpro

    @aitpro

    Ok at a dead end here and the facebook Developers Debug tool is very limited. I can only make 1 request and cannot do more than 1 test to do trial and error. Will have to look at this some other day. In any case, what is most important is that image files are being retrieved so whatever else this error is it is really not important and does not negatively impact anything besides just being a damn nuisance. 😉

    I’d be curious to know if the facebook problem stops if HEAD is removed- I will hopefully be testing this later tonight.

    Plugin Author AITpro

    @aitpro

    hmm I think that was tried already, but yeah maybe the most obvious thing is the issue. 😉

    I wonder if facebook falls back to GET if HEAD fails- Thus we’d record a block but also one would be passed through.

    Plugin Author AITpro

    @aitpro

    I removed HEAD and I did not get an error logged, but I do not think the facebook Debugger tool is still actually sending Requests to my site anymore. I assume they have some sort of abuse protection setup so that someone cannot just sit there and click Debug all day long?

    Plugin Author AITpro

    @aitpro

    Going by just the pure HTTP 206 Error on the facebook end of it then these would be the areas to look at:

    10.2.7 206 Partial Content

    The server has fulfilled the partial GET request for the resource. The request MUST have included a Range header field (section 14.35) indicating the desired range, and MAY have included an If-Range header field (section 14.27) to make the request conditional.

    The response MUST include the following header fields:

    – Either a Content-Range header field (section 14.16) indicating
    the range included with this response, or a multipart/byteranges
    Content-Type including Content-Range fields for each part. If a
    Content-Length header field is present in the response, its
    value MUST match the actual number of OCTETs transmitted in the
    message-body.
    – Date
    – ETag and/or Content-Location, if the header would have been sent
    in a 200 response to the same request
    – Expires, Cache-Control, and/or Vary, if the field-value might
    differ from that sent in any previous response for the same
    variant
    If the 206 response is the result of an If-Range request that used a strong cache validator (see section 13.3.3), the response SHOULD NOT include other entity-headers. If the response is the result of an If-Range request that used a weak validator, the response MUST NOT include other entity-headers; this prevents inconsistencies between cached entity-bodies and updated headers. Otherwise, the response MUST include all of the entity-headers that would have been returned with a 200 (OK) response to the same request.

    A cache MUST NOT combine a 206 response with other previously cached content if the ETag or Last-Modified headers do not match exactly, see 13.5.4.

    A cache that does not support the Range and Content-Range headers MUST NOT cache 206 (Partial) responses.

    Ok I’ve run my other site for about 12 hours with Bulletproof turned on, except for HEAD removed.

    I’m still encountering the Facebook problem and am also seeing that it looks like wordpress has been blocking itself- See the following, which has been anonymized.

    Any progress?

    >>>>>>>>>>> 403 Error Logged – February 7, 2013 – 11:05 pm <<<<<<<<<<<
    REMOTE_ADDR: 123.123.123.123
    Host Name: myserver.myhost.com
    HTTP_CLIENT_IP:
    HTTP_FORWARDED:
    HTTP_X_FORWARDED_FOR:
    HTTP_X_CLUSTER_CLIENT_IP:
    REQUEST_METHOD: GET
    HTTP_REFERER:
    REQUEST_URI: /wp-admin/post-new.php
    QUERY_STRING:
    HTTP_USER_AGENT: WordPress/3.5.1; http://www.mydomain.com

    Facebook problem:

    >>>>>>>>>>> 403 Error Logged – February 8, 2013 – 12:05 pm <<<<<<<<<<<
    REMOTE_ADDR: 173.252.110.112
    Host Name: 173.252.110.112
    HTTP_CLIENT_IP:
    HTTP_FORWARDED:
    HTTP_X_FORWARDED_FOR:
    HTTP_X_CLUSTER_CLIENT_IP:
    REQUEST_METHOD: GET
    HTTP_REFERER:
    REQUEST_URI: /wp-content/uploads/2011/08/icon.gif
    QUERY_STRING:
    HTTP_USER_AGENT: facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)

    Plugin Author AITpro

    @aitpro

    For the first error whitelist the post-new.php file in your wp-admin htaccess file. I am not sure what is causing that error, but it should be completely safe to whitelist that file. You would add this to BPS Custom Code:

    add this .htaccess bypass / skip code below to the wp-admin Custom Code box – CUSTOM CODE WPADMIN PLUGIN FIXES: and then activate BulletProof Mode for your wp-admin folder again. The skip rule must be [S=2] because it will be written to your wp-admin .htaccess file above skip / bypass rule [S=1]. This bypass / skip rule is safe to use because the wp-admin area is protected with WP Authentication security.

    # post-new.php bypass / skip rule
    RewriteCond %{REQUEST_URI} (post-new\.php) [NC]
    RewriteRule . - [S=2]

    I ran into a dead end since i cannot view the facebook script (not publicly available) and the facebook Developers Debugger tool does not allow me to do multiple tests. I only get 1 test per session or whatever other limit facebook has restricted the debugger tool too.

    The issue is some kind of Header problem with the externalhit_uatext.php script. I am guessing since i cannot view the script. the image files are successfully being retrieved, but something about the script is also trying to retrieve Header information that is not being successfully retrieved. This would make absolute logical sense because the error on the facebook side is a 206 Error which means the Header info could not be retrieved, which means the 206 Partial Content error also makes total logic sense.

    Why the facebook script cannot retrieve the Header i have no idea. This may or may have anything to do with BPS. When you google this issue you will find plenty of folks discussing this issue.

    Where i am at is this – i have no idea if this is related to BPS or not. There is no negative impact since images are retrieved successfully. There is only a nuisance factor since these errors are being logged. Since this is only a nuisance issue it has very low priority, but further testing is scheduled. The problem i have is i am shooting blind since i cannot view the facebook script – it is not publicly available.

    Are you familiar with rewrite logging? I can try to get you a rewrite log this weekend.

    Plugin Author AITpro

    @aitpro

    pending further scheduled testing.

    Plugin Author AITpro

    @aitpro

    I have done that already. The problem is i cannot see what the facebook script is doing. all the logs that i am checking do not tell me anything regarding what the facebook script is trying to do – Server Logs, BPS logs, Rewrite Logs, etc. – shooting blind.

    Plugin Author AITpro

    @aitpro

    This facebook 206 Error is not a WordPress issue or a BPS issue. I have Googled this and i see this error occuring on non-WordPress sites.

    this is a standard HTML site’s error logs so obviously there is something that is not quite right with the externalhit_uatext.php script itself, but yeah i would like to get rid of the nuisance factor of BPS logging this script’s issues/problems. 😉
    http://happyhourtvmd.com/logs/access_121210.log

    Plugin Author AITpro

    @aitpro

    hmm that just gave me an idea. The 206 error is being logged as a 403 error because a ErrorDocument 206 directive is not in the root .htaccess file. So logically something like this might work to get rid of the nuisance. add this ErrorDocument directive to your root .htaccess file and create a blank 206 php file and upload it to your site somewhere and add the correct path to the 206.php file.

    ErrorDocument 206 /206.php

    Plugin Author AITpro

    @aitpro

    Nope that did not work either. So I am at a dead end again and will revisit this another time. Whatever that facebook script is doing something about it must be seen as a threat to BPS. The error really is a 403 and not a 206 being logged as a 403 on the website end of it. When i check the linked image on facebook itself the image is displayed so back at square 1 – something about how the Header info is trying to be retrieved violates some security rule in BPS. The 206 error appears to happen anyway on WordPress and non-WordPress sites so my only concern is just getting rid of the nuisance and not worrying about what the issue is about the 206 error on the facebook end of it is.

    >>>>>>>>>>> 403 GET or Other Request Error Logged - February 8, 2013 - 12:04 pm <<<<<<<<<<<
    REMOTE_ADDR: 69.171.247.113
    Host Name: 69.171.247.113
    HTTP_CLIENT_IP:
    HTTP_FORWARDED:
    HTTP_X_FORWARDED_FOR:
    HTTP_X_CLUSTER_CLIENT_IP:
    REQUEST_METHOD: GET
    HTTP_REFERER:
    REQUEST_URI: /aitpro-blog/wp-content/themes/AITpro/images/aitpro-logo-footer.png
    QUERY_STRING:
    HTTP_USER_AGENT: facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)

Viewing 15 replies - 16 through 30 (of 57 total)
  • The topic ‘[Resolved] Linkchecker and other legit bots are broken’ is closed to new replies.