How do I allow AmazonAWS to crawl?
-
I’m trying to run the RavenCrawler on a client’s website, but iThemes Security is preventing it from happening. I looked into whitelisting the ip ranges that AmazonAWS uses, but they’re provided in a JSON format, and it doesn’t look like iThemes supports that.
Is there a way that I could allow the RavenCrawler in the .htaccess file to give it root level permissions to crawl the entire site? If so, how would I go about doing that? This is a fairly urgent matter that I need to resolve ASAP.
Please help me out.
-
What makes you think it is the iTSec plugin stopping RavenCrawler ?
Does it work when you temporarily deactivate the iTSec plugin ?dwinden
I disabled the security plugin and the crawl was working, but I can’t leave it disabled because the client’s user base can’t log in when it’s not active.
Ok, I see.
It’s probably something in the .htaccess file that is blocking it.
Reactivate the iTSec plugin (if not already).And then if possible please post the content of the .htaccess file while obfuscating any sensitive info.
Also confirm you are using the iTSec plugin release 5.2.1
dwinden
# BEGIN iThemes Security – Do not modify or remove this line
# iThemes Security Config Details: 2
# Enable HackRepair.com’s blacklist feature – Security > Settings > Banned Users > Default Blacklist
# Start HackRepair.com Blacklist
RewriteEngine on
# Start Abuse Agent Blocking
RewriteCond %{HTTP_USER_AGENT} “^Mozilla.*Indy” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^Mozilla.*NEWT” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^$” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^Maxthon$” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^SeaMonkey$” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^Acunetix” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^binlar” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^BlackWidow” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^Bolt 0” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^BOT for JCE” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^Bot mailto\:craftbot@yahoo\.com” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^casper” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^checkprivacy” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^ChinaClaw” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^clshttp” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^cmsworldmap” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^comodo” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^Custo” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^Default Browser 0” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^diavol” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^DIIbot” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^DISCo” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^dotbot” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^Download Demon” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^eCatch” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^EirGrabber” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^EmailCollector” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^EmailSiphon” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^EmailWolf” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^Express WebPictures” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^extract” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^ExtractorPro” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^EyeNetIE” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^feedfinder” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^FHscan” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^FlashGet” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^flicky” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^g00g1e” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^GetRight” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^GetWeb\!” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^Go\!Zilla” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^Go\-Ahead\-Got\-It” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^grab” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^GrabNet” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^Grafula” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^harvest” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^HMView” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^ia_archiver” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^Image Stripper” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^Image Sucker” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^InterGET” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^Internet Ninja” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^InternetSeer\.com” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^jakarta” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^Java” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^JetCar” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^JOC Web Spider” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^kanagawa” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^kmccrew” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^larbin” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^LeechFTP” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^libwww” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^Mass Downloader” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^microsoft\.url” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^MIDown tool” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^miner” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^Mister PiX” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^MSFrontPage” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^Navroad” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^NearSite” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^Net Vampire” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^NetAnts” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^NetSpider” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^NetZIP” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^nutch” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^Octopus” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^Offline Explorer” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^Offline Navigator” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^PageGrabber” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^Papa Foto” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^pavuk” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^pcBrowser” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^PeoplePal” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^planetwork” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^psbot” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^purebot” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^pycurl” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^RealDownload” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^ReGet” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^Rippers 0” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^sitecheck\.internetseer\.com” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^SiteSnagger” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^skygrid” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^SmartDownload” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^sucker” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^SuperBot” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^SuperHTTP” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^Surfbot” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^tAkeOut” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^Teleport Pro” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^Toata dragostea mea pentru diavola” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^turnit” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^vikspider” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^VoidEYE” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^Web Image Collector” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^Web Sucker” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^WebAuto” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^WebBandit” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^WebCopier” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^WebFetch” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^WebGo IS” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^WebLeacher” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^WebReaper” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^WebSauger” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^Website eXtractor” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^Website Quester” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^WebStripper” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^WebWhacker” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^WebZIP” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^Wget” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^Widow” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^WPScan” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^WWW\-Mechanize” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^WWWOFFLE” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^Xaldon WebSpider” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^Zeus” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “^zmeu” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “360Spider” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “AhrefsBot” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “CazoodleBot” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “discobot” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “EasouSpider” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “ecxi” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “GT\:\:WWW” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “heritrix” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “HTTP\:\:Lite” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “HTTrack” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “ia_archiver” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “id\-search” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “IDBot” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “Indy Library” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “IRLbot” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “ISC Systems iRc Search 2\.1” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “LinksCrawler” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “LinksManager\.com_bot” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “linkwalker” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “lwp\-trivial” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “MFC_Tear_Sample” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “Microsoft URL Control” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “Missigua Locator” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “MJ12bot” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “panscient\.com” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “PECL\:\:HTTP” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “PHPCrawl” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “PleaseCrawl” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “SBIder” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “SearchmetricsBot” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “SeznamBot” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “Snoopy” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “Steeler” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “URI\:\:Fetch” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “urllib” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “Web Sucker” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “webalta” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “WebCollage” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “Wells Search II” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “WEP Search” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “XoviBot” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “YisouSpider” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “zermelo” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “ZyBorg” [NC,OR]
# End Abuse Agent Blocking
# Start Abuse HTTP Referrer Blocking
RewriteCond %{HTTP_REFERER} “^https?://(?:[^/]+\.)?semalt\.com” [NC,OR]
RewriteCond %{HTTP_REFERER} “^https?://(?:[^/]+\.)?kambasoft\.com” [NC,OR]
RewriteCond %{HTTP_REFERER} “^https?://(?:[^/]+\.)?savetubevideo\.com” [NC]
# End Abuse HTTP Referrer Blocking
RewriteRule ^.* – [F,L]
# End HackRepair.com Blacklist, http://pastebin.com/u/hackrepair# Enable the hide backend feature – Security > Settings > Hide Login Area > Hide Backend
RewriteRule ^(/)?wwlogin/?$ /wp-login.php [QSA,L]# Protect System Files – Security > Settings > System Tweaks > System Files
<files .htaccess>
<IfModule mod_authz_core.c>
Require all denied
</IfModule>
<IfModule !mod_authz_core.c>
Order allow,deny
Deny from all
</IfModule>
</files>
<files readme.html>
<IfModule mod_authz_core.c>
Require all denied
</IfModule>
<IfModule !mod_authz_core.c>
Order allow,deny
Deny from all
</IfModule>
</files>
<files readme.txt>
<IfModule mod_authz_core.c>
Require all denied
</IfModule>
<IfModule !mod_authz_core.c>
Order allow,deny
Deny from all
</IfModule>
</files>
<files install.php>
<IfModule mod_authz_core.c>
Require all denied
</IfModule>
<IfModule !mod_authz_core.c>
Order allow,deny
Deny from all
</IfModule>
</files>
<files wp-config.php>
<IfModule mod_authz_core.c>
Require all denied
</IfModule>
<IfModule !mod_authz_core.c>
Order allow,deny
Deny from all
</IfModule>
</files># Disable Directory Browsing – Security > Settings > System Tweaks > Directory Browsing
Options -Indexes<IfModule mod_rewrite.c>
RewriteEngine On# Protect System Files – Security > Settings > System Tweaks > System Files
RewriteRule ^wp-admin/includes/ – [F]
RewriteRule !^wp-includes/ – [S=3]
RewriteCond %{SCRIPT_FILENAME} !^(.*)wp-includes/ms-files.php
RewriteRule ^wp-includes/[^/]+\.php$ – [F]
RewriteRule ^wp-includes/js/tinymce/langs/.+\.php – [F]
RewriteRule ^wp-includes/theme-compat/ – [F]# Filter Request Methods – Security > Settings > System Tweaks > Request Methods
RewriteCond %{REQUEST_METHOD} ^(TRACE|DELETE|TRACK) [NC]
RewriteRule ^.* – [F]# Filter Suspicious Query Strings in the URL – Security > Settings > System Tweaks > Suspicious Query Strings
RewriteCond %{QUERY_STRING} \.\.\/ [NC,OR]
RewriteCond %{QUERY_STRING} ^.*\.(bash|git|hg|log|svn|swp|cvs) [NC,OR]
RewriteCond %{QUERY_STRING} etc/passwd [NC,OR]
RewriteCond %{QUERY_STRING} boot\.ini [NC,OR]
RewriteCond %{QUERY_STRING} ftp\: [NC,OR]
RewriteCond %{QUERY_STRING} http\: [NC,OR]
RewriteCond %{QUERY_STRING} https\: [NC,OR]
RewriteCond %{QUERY_STRING} (\<|%3C).*script.*(\>|%3E) [NC,OR]
RewriteCond %{QUERY_STRING} mosConfig_[a-zA-Z_]{1,21}(=|%3D) [NC,OR]
RewriteCond %{QUERY_STRING} base64_encode.*\(.*\) [NC,OR]
RewriteCond %{QUERY_STRING} ^.*(%24&x).* [NC,OR]
RewriteCond %{QUERY_STRING} ^.*(127\.0).* [NC,OR]
RewriteCond %{QUERY_STRING} ^.*(globals|encode|localhost|loopback).* [NC,OR]
RewriteCond %{QUERY_STRING} ^.*(request|concat|insert|union|declare).* [NC]
RewriteCond %{QUERY_STRING} !^loggedout=true
RewriteCond %{QUERY_STRING} !^action=jetpack-sso
RewriteCond %{QUERY_STRING} !^action=rp
RewriteCond %{HTTP_COOKIE} !^.*wordpress_logged_in_.*$
RewriteCond %{HTTP_REFERER} !^http://maps\.googleapis\.com(.*)$
RewriteRule ^.* – [F]# Reduce Comment Spam – Security > Settings > System Tweaks > Comment Spam
RewriteCond %{REQUEST_METHOD} POST
RewriteCond %{REQUEST_URI} /wp-comments-post\.php$
RewriteCond %{HTTP_USER_AGENT} ^$ [OR]
RewriteCond %{HTTP_REFERER} !^https?://(([^/]+\.)?231\.124|jetpack\.wordpress\.com/jetpack-comment)(/|$) [NC]
RewriteRule ^.* – [F]
</IfModule>
# END iThemes Security – Do not modify or remove this line# HTTP AUTH
# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ – [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule># END WordPress
It looks like the versioning on the theme is 5.1.1
Ok, try and disable the Enable HackRepair.com’s blacklist feature setting in the Banned Users section of the iTSec plugin Settings page.
I haven’t seen ‘RavenCrawler’ in the list but who knows …
dwinden
It went through, no clue what changed. Will chalk it up to a resolved mystery.
The topic ‘How do I allow AmazonAWS to crawl?’ is closed to new replies.