Plugin Author
AITpro
(@aitpro)
1. Hmm not sure if this should be categorized as a bug since BPS is actively blocking something that is violating security rules/filters in the root .htaccess file. I will install and test the Broken Link checker plugin to see what is being blocked and why. pending testing.
2. I don’t believe that this is a bug either. I am not exactly sure what is causing these errors. I would appreciate any information that you can provide so that i can narrow down what is actually going on. Whether this is really the facebook bot or just some new spam bot disguised as a legitimate bot. Logically what could be occuring is that either some plugin that legitimately connects with facebook could be in the equation or possibly something has changed about the way facebook is now retrieving image files. Example: the way the image files are being retrieved violates the security rules/filters in the root htaccess file. So please post any plugins that you have installed that would have anything at all to do with facebook or any other logical relevant cause that you think could be in the equation.
3. Some Server Configurations do not allow certain directives to be used in htaccess files. One of the more common htaccess directives that is not allowed/disallowed on some hosts is the Options directive, but I have also seen some hosts disallow/not allow the DirectoryIndex directive as well.
The majority of Hosts allow both of these htaccess directives in the httpd.conf file, which in turn means they are allowed in htaccess files. I think that ratio is around 99% allow these directives to 1% that do not allow these directives. I will look into if it is possible to somehow detect if these directives are allowed on a particular host and then write or do not write them based on the result. I don’t really think this is possible, but I will check it out anyway. š
Plugin Author
AITpro
(@aitpro)
Now anytime this plugin is updated I will have to comment out that line again.
Actually you would not have to comment out that line again. BPS updates are now automated. You do not need to click the AutoMagic buttons and activate BulletProof Modes anymore when installing a BPS upgrade. BPS will not change any htaccess code modifications that you have made. BPS will only automatically update the .htaccess files and add new .htaccess code or remove obsolete code or do other htaccess code house cleaning automatically on upgrade.
So if you used the AutoMagic buttons again then yes you would need to comment out that line again.
Plugin Author
AITpro
(@aitpro)
oh wow! I am seeing the facebook UA in my logs now too. So this is definitely something new that facebook is doing. Ok I am not using any facebook related plugins so that is out. So this is definitely isolated to something new that facebook is doing to retrieve image files or this is some new form of spam/recon/sniffer bot. I will figure this out and post the solution here.
>>>>>>>>>>> 403 Error Logged - February 6, 2013 - 12:09 pm <<<<<<<<<<<
REMOTE_ADDR: 69.171.247.112
Host Name: 69.171.247.112
HTTP_CLIENT_IP:
HTTP_FORWARDED:
HTTP_X_FORWARDED_FOR:
HTTP_X_CLUSTER_CLIENT_IP:
REQUEST_METHOD: GET
HTTP_REFERER:
REQUEST_URI: /wp-content/uploads/2012/11/Wordfence-P3-Profiler-Scan-1-300x237.png
QUERY_STRING:
HTTP_USER_AGENT: facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)
Plugin Author
AITpro
(@aitpro)
Plugin Author
AITpro
(@aitpro)
What is important to note and keep in mind is that image files are not being blocked themselves. What is being blocked is how the check to see if your image files still exist at that URL is what is being blocked. Still trying to track down the script itself. It is probably not publicly available though….
Plugin Author
AITpro
(@aitpro)
aha now I am getting somewhere
facebook Developers Debugger tool to check Open Graph, etc.
http://developers.facebook.com/tools/debug
facebook Crawler / Scraper
https://developers.facebook.com/docs/ApplicationSecurity/#facebook_scraper
Thanks for your fast and thorough response- I’ve become very familiar with the .htaccess rules and I’ve spent a while attempting to figure out why it’s blocking the link checker and the facebook bot. I assume the facebook bot is downloading the thumbnail for the page- This is what we want because people are more likely to click on a link if it has a thumbnail.
I thought perhaps the link checker was making a HEAD request but it says GET in the log.
Also, with point 3 let me clarify. I have http://www.mydomain.com/ with wordpress installed at root. I then have http://www.mydomain.com/dir/ which I have placed a .htaccess file in containing a single line: “Options +Indexes” in order to display the index in that one directory but no others. The line that I commented out interfered with my customization.
I also put the following custom code in the custom code tab so I can except my directory from being processed by the wordpress script. This had worked with previous versions of bulletproof security but only recently stopped working. It took a lot of effort to find out what the problem was. I’ve worked around this issue but I thought you might want to know about the issue in case it can help another user.
# EXCEPTIONS FOR VARIOUS MYCOMPANY DIRECTORIES
RewriteCond %{REQUEST_URI} ^/dir [NC]
RewriteRule . – [L]
RewriteCond %{REQUEST_URI} ^/dir2 [NC]
RewriteRule . – [L]
(Please note that the above customizations are on one wordpress installation- problems 1 and 2 were replicated on a vanilla install on a different server.
I will try to get you a rewrite log later.
Plugin Author
AITpro
(@aitpro)
I have not determined yet what the script is doing. From everything i have read so far all that script does is verify that the image file still exists and does not do anything else. Once i figure what it is doing exactly and how it is doing it then i will have/create a solution.
A HEAD Request will be logged as a GET Request.
Yes, that would make sense because the 2 directives conflict with each other.
Yep thanks for posting that custom code as it may help someone else out with that exact same scenario. š
Plugin Author
AITpro
(@aitpro)
What is interesting is this:
Using the facebook Developers Debugger tool the thumbnail image is retreived successfully and the image file itself is retreived successfully, but you also see a 206 error. I keep running into that “cache/caching” is somehow involved in this equation.
http://100pulse.com/http-statuscode/206.jsp
Scrape Information
Response Code: 206
Fetched URL: http://forum.ait-pro.com/wp-content/uploads/2012/11/Wordfence-P3-Profiler-Scan-1-300×237.png
Canonical URL: http://forum.ait-pro.com/wp-content/uploads/2012/11/Wordfence-P3-Profiler-Scan-1-300×237.png
Errors That Must Be Fixed
Can’t Download: Could not retrieve data from URL.
URLs
Graph API: http://graph.facebook.com/210058239138438
Scraped URL: See exactly what our scraper sees for your URL
Type of Share
When this URL is shared on facebook, it is treated as a certain type. By putting meta tags on this page, you can influence how it is shared.
Photo
A HEAD Request will be logged as a GET Request.
I will remove HEAD from the htaccess file and let you know if this fixes it the link checker. I can’t test the facebook issue until later tonight because current development is on an internal server can’t be accessed by facebook.
Plugin Author
AITpro
(@aitpro)
Which caching plugin do you use?
Plugin Author
AITpro
(@aitpro)
hmm interesting because i recently just deleted my caching plugin and am now only doing caching purely with htaccess code. I need to check several sites and compare the differences. getting warmer.
It looks like the broken link checker plugin is indeed using HEAD requests- I haven’t had a 403 error since removing HEAD checking-
If the log incorrectly characterizes a HEAD as a GET, then that’s a problem- Really was a head scratcher.
this is from broken-link-checker/modules/checkers/http.php
if ( $nobody ){
//If possible, use HEAD requests for speed.
curl_setopt($ch, CURLOPT_NOBODY, true);
} else {
//If we must use GET at least limit the amount of downloaded data.
curl_setopt($ch, CURLOPT_HTTPHEADER, array('Range: bytes=0-2048')); //2 KB
}
Side note re caching plugins- I’d love to use one but I had problems with certain dynamic content.