Image bots blocked on Multisite
-
Hi there,
we’ve got a multisite with subdomain setup. I’ve used the auto-magic buttons to create the .htaccess files.Now the log file is showing that all image bots are being blocked.
I found this solution: http://wordpress.org/support/topic/bulletproof-security-block-google-to-access-images
# Theme Timthumb skip/bypass rule RewriteCond %{REQUEST_URI} ^/wp-content/themes/MYTHEME/(.*)timthumb\.php [NC] RewriteRule . - [S=13]
But since we’re using multiple themes, what do I do? write out all the themes in use? If yes, what’s the correct format? I’m afraid to mess with htaccess.
Thank you
-
Please post a Security log entry related to this issue/problem from your BPS Security Log. Please only post 1 log entry.
General Help Information About BPS and WordPress Network / Multisite Websites
Do not Network Activate BPS.
BPS Should only be activated on the Primary Site.
BPS AutoMagic buttons should ONLY be used on the Primary Site
Network/Multisite sub-sites are virtual sites – BulletProof Modes should NOT be activated on sub-sites
ONLY Super Admins can see the BPS menu in sub-sites for both BPS Free and BPS Pro
If a Site Admin tries to view BPS Settings or BPS pages a Super Admins Permissions Error message is displayedI’m actually not sure that I read the log file correctly.
Actual names are replaced with FILENAME and MYWEBSITEURL.COM
>>>>>>>>>>> 403 GET or Other Request Error Logged - May 22, 2013 - 2:39 pm <<<<<<<<<<< REMOTE_ADDR: 66.249.73.3 Host Name: crawl-66-249-73-3.googlebot.com HTTP_CLIENT_IP: HTTP_FORWARDED: HTTP_X_FORWARDED_FOR: HTTP_X_CLUSTER_CLIENT_IP: REQUEST_METHOD: GET HTTP_REFERER: REQUEST_URI: /tag/soil/%3C?php%20bloginfo(\'template_directory\');%20?%3E/images/FILENAME.png%E2%80%9D%20alt=%E2%80%9D QUERY_STRING: HTTP_USER_AGENT: Googlebot-Image/1.0
Here’s another one:
>>>>>>>>>>> 403 GET or Other Request Error Logged - May 22, 2013 - 4:31 pm <<<<<<<<<<< REMOTE_ADDR: 50.116.103.39 Host Name: 50.116.103.39 HTTP_CLIENT_IP: HTTP_FORWARDED: HTTP_X_FORWARDED_FOR: HTTP_X_CLUSTER_CLIENT_IP: REQUEST_METHOD: GET HTTP_REFERER: REQUEST_URI: /files/2013/05/shutterstock_78551506-270x180.jpg QUERY_STRING: HTTP_USER_AGENT: WordPress/3.5.1; http://MYWEBSITEURL.COM
Last one
>>>>>>>>>>> 403 GET or Other Request Error Logged - mai 17, 2013 - 10:33 <<<<<<<<<<< REMOTE_ADDR: 66.249.73.3 Host Name: crawl-66-249-73-3.googlebot.com HTTP_CLIENT_IP: HTTP_FORWARDED: HTTP_X_FORWARDED_FOR: HTTP_X_CLUSTER_CLIENT_IP: REQUEST_METHOD: GET HTTP_REFERER: REQUEST_URI: /ARTICLENAME/%3C?php%20bloginfo(\'template_directory\');%20?%3E/images/FILENAME.png%C2%A0%C2%BB%20alt=%C2%A0%C2%BB QUERY_STRING: HTTP_USER_AGENT: Googlebot-Image/1.0
I followed those instructions when setting it up.
This is a coding mistake. the backslashes should not be there. Most likely this is bad code in your Theme. Notify the Theme Author and have them remove the backslashes. You do not need to escape something like this with backslashes…ever.
bloginfo(\’template_directory\’)This is a common known issue: The Googlebot is able to retrieve images successfully, but something else that the script is doing also triggers a 403 error. You can just choose to ignore these erorrs on the Security Log page by ignoring the Googlebot User Agent (means not logging errors-NOT ignore the Googlebot) or you can whitelist the Googlebot. The choice is up to you.
Whitelist approach…
Another approach would be to whitelist the Googlebot User Agent. You would add RewriteCond %{HTTP_USER_AGENT} ^.*Googlebot.* to this security filter in your root .htaccess file.
# TIMTHUMB FORBID RFI and MISC FILE SKIP/BYPASS RULE # Only Allow Internal File Requests From Your Website # To Allow Additional Websites Access to a File Use [OR] as shown below. # RewriteCond %{HTTP_REFERER} ^.*YourWebsite.com.* [OR] # RewriteCond %{HTTP_REFERER} ^.*AnotherWebsite.com.* RewriteCond %{QUERY_STRING} ^.*(http|https|ftp)(%3A|:)(%2F|/)(%2F|/)(w){0,3}.?(blogger|picasa|blogspot|tsunami|petapolitik|photobucket|imgur|imageshack|wordpress\.com|img\.youtube|tinypic\.com|upload\.wikimedia|kkc|start-thegame).*$ [NC,OR] RewriteCond %{THE_REQUEST} ^.*(http|https|ftp)(%3A|:)(%2F|/)(%2F|/)(w){0,3}.?(blogger|picasa|blogspot|tsunami|petapolitik|photobucket|imgur|imageshack|wordpress\.com|img\.youtube|tinypic\.com|upload\.wikimedia|kkc|start-thegame).*$ [NC] RewriteRule .* index.php [F,L] RewriteCond %{REQUEST_URI} (timthumb\.php|phpthumb\.php|thumb\.php|thumbs\.php) [NC] RewriteCond %{HTTP_REFERER} ^.*example.com.* [OR] RewriteCond %{HTTP_USER_AGENT} ^.*Googlebot.* RewriteRule . - [S=1]
Ignore logging approach..
Go to the Security Log page, enter Googlebot for the Add User Agents/Bots to Ignore/Not Log and click the Add/Ignore button.
I’m trying the whitelist approach first.
Would you say that one of these methods is better than another?
Thank you!
Well to analyze it the problem starts with crappy code in a theme or plugin doing something in a very poor way. Unfortunately, this is very common. So what you want to do is get around that bad/poor coding work and get rid of the nuisance errors that are generated because of that.
Here are the pitfalls of whitelisting the googlebot. Hackers can spoof/fake that they are the googlebot. So if you whitelist the googlebot you may actually be whitelisting a hacking attempt against your website using timthumb as the exploit.
Personally I think it is better just to ignore the nuisance errors because this means your site is still protected, but you are no longer seeing log entries due to the original problem – poor coding work in a Theme or plugin.
What negates the pitfall of ignoring actual real googlebot errors is this – if your site actually does have an image retrieval problem then Google will let you know about it. 😉
Great, thank you, this helps a lot.
I’ve re-generated a new secure htaccess to remove the stuff I added, and have instead added Googlebot-Image to the ‘ignore logging’ list.
Eventually the themes are going to get fixed, but I don’t see that as a priority.
***
I’ve got a different question now. Sorry, not sure if it’s ok to continue the conversation or should I start a new topic?I found 2 backdoor scripts sitting in different directories, and have removed them. Before removing I tested one of them and because it was in the /themes/ directory, your wonderful plugin disallowed direct access to the .php file. Does it mean that if there are undiscovered mal.scripts sitting in other directories, hackers won’t be able to access them now?
Eventually the themes are going to get fixed, but I don’t see that as a priority.
The path to image files is not valid so these images will not be displayed until the code is fixed. Instead of seeing bloginfo(\’template_directory\’) you should instead be seeing the actual path to the template directory/your Theme folder, but since the code is not correct you will never see any images displayed until that is fixed.
If you found hacker files on your website then there are most likely going to be several more that you did not find. You need to change all of your passwords. Then you should restore your website from a good backup or just backup your database, nuke this site and install a new one if you do not have a good backup that is clean. And finally you would import ONLY your content database tables into your new database.
Resolved.
- The topic ‘Image bots blocked on Multisite’ is closed to new replies.