Support » Plugin: Blackhole for Bad Bots » Archive.org Wayback Machine

  • Resolved toolsavvy

    (@toolsavvy)


    In your plugin’s description, you state…

    …Blackhole only affects bad bots: human users never see the hidden link, and good bots obey the robots rules in the first place.

    I want to block Archive.org Wayback Machine.

    Apparently Archive’org’s bots (ia_archiver and archive.org_bot) have stopped obeying robots.txt files since around late 2017. Since 2015/2016 I have successfully blocked Archive.org/Wayback Machine from crawling and archiving my sites. But sometime in late 2017, they stopped obeying my robots.txt file and have crawled and archived all my sites. Formal emails to them to remove my sites have been fruitless. I have had the following entries in my robots.txt file for years now and they used to work…

    User-agent: archive.org_bot
    Disallow: /
    
    User-agent: ia_archiver
    Disallow: /

    But they no longer work. Last week, I added the following meta tags to my site…

    <meta name="ia_archiver" content="noindex,nofollow,noarchive">
    <meta name="archive.org_bot" content="noindex,nofollow,noarchive">

    …and that also does not seem to be working.

    So since archive.org apparently does not obey robots.txt files any longer, will your plugin block/trap ia_archiver and archive.org_bot bots? This is what I am looking for.

Viewing 3 replies - 1 through 3 (of 3 total)
  • Plugin Author Jeff Starr

    (@specialk)

    Glad to help:

    “will your plugin block/trap ia_archiver and archive.org_bot bots? ”

    Yes, but only if they follow the hidden blackhole link. Otherwise, it is possible to add/block manually using the pro version.

    Let me know if I can provide any further infos, glad to help.

    toolsavvy

    (@toolsavvy)

    Otherwise, it is possible to add/block manually using the pro version.

    Interesting. How exactly does that work in the Pro version? I mean, do I have to know all of IP addresses archive.org uses for their bots in order to manually ban archive.org bots with the Pro version?

    Plugin Author Jeff Starr

    (@specialk)

    Yeah it’s all IP-based. If you want to know more about the Pro version you can contact me directly (the forums here at WordPress.org are for free plugins only).

    As for blocking based on user agent, I haven’t seen a plugin that can do it, although it may be possible as a feature in one of the popular “all-in-one” type security plugins. I haven’t checked though.

    If you just want a quick, effective way of blocking ia_archiver, you could always add a rule to your site config or .htaccess. For example, on Apache servers with mod_rewrite enabled, these two lines will stop any/all ia_archiver access:

    RewriteCond %{HTTP_USER_AGENT} (ia_archiver) [NC]
    RewriteRule . - [F,L]

    Other/similar rules available for other servers (e.g., Nginx). So it’s definitely possible to block any desired user agents.

Viewing 3 replies - 1 through 3 (of 3 total)
  • The topic ‘Archive.org Wayback Machine’ is closed to new replies.