Title: Blocking &#8220;crawler&#8221;
Last modified: November 27, 2024

---

# Blocking “crawler”

 *  Resolved [mywebmaestro](https://wordpress.org/support/users/mywebmaestro/)
 * (@mywebmaestro)
 * [1 year, 5 months ago](https://wordpress.org/support/topic/blocking-crawler/)
 * I seem to be having many sites that are bleeding bandwidth (upwards of 20GB a
   month and more) that aren’t getting caught by blackhole… in Awstats, it’s listed
   as “crawler” (when I tried adding a modsec rule for that, it seemed to be blocking
   multiple bots, so I’m not sure whether or not the Awstats info is reliable.) 
   In any case, while I do get occassional notices from the plugin that a bot has
   gotten blocked, it seems to miss a lot. Is the only protection based on a bot
   following a link it’s told not to? Is there something I’m missing in how to set
   this up effectively?

Viewing 6 replies - 1 through 6 (of 6 total)

 *  Plugin Author [Jeff Starr](https://wordpress.org/support/users/specialk/)
 * (@specialk)
 * [1 year, 5 months ago](https://wordpress.org/support/topic/blocking-crawler/#post-18179668)
 * The first most important question: is there any *page-cache* happening on site?
   As explained in the docs, page cache breaks dynamic plugins like Blackhole from
   working correctly. So that would be the first thing to check.
 *  Thread Starter [mywebmaestro](https://wordpress.org/support/users/mywebmaestro/)
 * (@mywebmaestro)
 * [1 year, 5 months ago](https://wordpress.org/support/topic/blocking-crawler/#post-18181250)
 * I have Hummingbird installed (from WPMUDEV) but page caching is turned off. I
   have browser and gravatar caching enabled. [https://wpmudev.com/docs/wpmu-dev-plugins/hummingbird/#caching](https://wpmudev.com/docs/wpmu-dev-plugins/hummingbird/#caching)
 * Am I right though in understanding that in order to get blocked, a “bad” bot 
   has to break the coded rule and try indexing the forbidden link? Are there any
   other protection options based on other behavior?
 *  Plugin Author [Jeff Starr](https://wordpress.org/support/users/specialk/)
 * (@specialk)
 * [1 year, 4 months ago](https://wordpress.org/support/topic/blocking-crawler/#post-18193244)
 * That is correct, as explained in the plugin docs.
 *  Thread Starter [mywebmaestro](https://wordpress.org/support/users/mywebmaestro/)
 * (@mywebmaestro)
 * [1 year, 4 months ago](https://wordpress.org/support/topic/blocking-crawler/#post-18193298)
 * Are there any plans to try to add an ability to combat the AI training scraping
   that’s going on? I seem to be seeing a lot of bot traffic from that which doesn’t
   obey robots.txt, and also doesn’t seem to always identify itself consistently.
 *  Plugin Author [Jeff Starr](https://wordpress.org/support/users/specialk/)
 * (@specialk)
 * [1 year, 4 months ago](https://wordpress.org/support/topic/blocking-crawler/#post-18193707)
 * It’s a good idea and something that I hope to get to implement soon.
 *  Plugin Author [Jeff Starr](https://wordpress.org/support/users/specialk/)
 * (@specialk)
 * [1 year, 2 months ago](https://wordpress.org/support/topic/blocking-crawler/#post-18321160)
 * Hey [@mywebmaestro](https://wordpress.org/support/users/mywebmaestro/), I hope
   you got this sorted. It’s been a while with no reply so gonna go ahead and mark
   this thread as resolved to help keep the forum organized. Feel free to post again
   with any further questions or feedback, Thank you.

Viewing 6 replies - 1 through 6 (of 6 total)

The topic ‘Blocking “crawler”’ is closed to new replies.

 * ![](https://ps.w.org/blackhole-bad-bots/assets/icon-256x256.png?rev=1471215)
 * [Blackhole for Bad Bots](https://wordpress.org/plugins/blackhole-bad-bots/)
 * [Frequently Asked Questions](https://wordpress.org/plugins/blackhole-bad-bots/#faq)
 * [Support Threads](https://wordpress.org/support/plugin/blackhole-bad-bots/)
 * [Active Topics](https://wordpress.org/support/plugin/blackhole-bad-bots/active/)
 * [Unresolved Topics](https://wordpress.org/support/plugin/blackhole-bad-bots/unresolved/)
 * [Reviews](https://wordpress.org/support/plugin/blackhole-bad-bots/reviews/)

## Tags

 * [crawler](https://wordpress.org/support/topic-tag/crawler/)

 * 6 replies
 * 2 participants
 * Last reply from: [Jeff Starr](https://wordpress.org/support/users/specialk/)
 * Last activity: [1 year, 2 months ago](https://wordpress.org/support/topic/blocking-crawler/#post-18321160)
 * Status: resolved