Crawlantix AI Bot Tracker

Description

Crawlantix AI Bot Tracker monitors visits from 30+ AI crawlers and gives you visibility into how bots interact with your site. See which bots visit, what pages they crawl, and catch misbehaving bots with a built-in logging-only honeypot. Lightweight, privacy-first, and fully functional out of the box.

Free forever — no artificial limits on bot detection.

Tracked Bots (30+):

  • GPTBot / ChatGPT-User / OAI-SearchBot (OpenAI)
  • ClaudeBot / Claude-Web / anthropic-ai (Anthropic)
  • Googlebot / Google-Extended (Google / Gemini)
  • bingbot (Microsoft / Copilot)
  • PerplexityBot (Perplexity)
  • DeepSeek, Qwen (Alibaba), Mistral AI
  • Applebot-Extended (Apple Intelligence)
  • Meta-ExternalAgent (Meta AI)
  • Bytespider (ByteDance)
  • CCBot (Common Crawl), Amazonbot, YouBot, DuckDuckBot
  • AI2Bot, Diffbot, Timpibot, PetalBot
  • SemrushBot, AhrefsBot, DataForSeoBot, MJ12bot, DotBot

Features:

Lightweight Bot Detection (30+ Bots)

Hooks into WordPress init with a priority-1 action. Bails immediately on non-AI User-Agents — zero performance cost for human visitors. All 30+ bots tracked.

Dashboard with Charts

Clean dashboard with trend chart and provider breakdown pie chart (Chart.js, bundled locally). Summary cards show total visits, unique bots, pages crawled, and daily averages. No external dependencies.

Bot Activity Table

Dedicated tab showing all detected bots with visits, bytes transferred, 24h sparklines, verification status, and honeypot hit counts.

Crawled Pages

See which pages bots visit most, which bots crawl them, and when they were last seen.

Honeypot Endpoint (Logging Only)

A CSS-hidden, aria-hidden, rel=nofollow link is injected in the footer. Only raw link-extracting bots will follow it. Visits are logged for transparency. Active defense (blocking, tarpit, rate-limit, decoy, shadowban) is reserved for the paid build.

Bot Verification

Reverse DNS (FCrDNS) verification for major bots — confirms Googlebot, GPTBot, ClaudeBot, etc. are actually who they claim to be.

Privacy First

IP addresses are SHA-256 hashed with a per-install salt before storage. Raw IPs are never saved. Includes WordPress Privacy API exporter and eraser hooks so data-subject access and erasure requests can flow through the standard WordPress Tools Personal Data workflow.

AI Discovery Layer

Serves ai-plugin.json (a discovery manifest that tells visiting AI agents the site is monitored) and llms.txt / llms-full.txt (text content authored by the admin via WordPress pages with slugs llms-txt and llms-full-txt) at the site root.

Data Retention

Bot visit data is retained for 30 days. Older records are automatically pruned via WP-Cron.

External Services

This plugin connects to the following external services:

Reverse DNS Lookups

The bot verification feature performs reverse DNS (FCrDNS) lookups using PHP’s gethostbyaddr() and gethostbyname() functions to verify that bots are who they claim to be (e.g., confirming a request claiming to be Googlebot actually originates from Google’s network). These lookups send the bot’s IP address to your server’s configured DNS resolver and the authoritative DNS servers for the IP address’s reverse DNS zone. Under some privacy regimes, IP addresses may be considered personal data. This feature runs automatically when a known AI bot visits your site and cannot currently be disabled via the admin UI (a filter hook crawlantix_enable_verification is available for developers).

No other external services, third-party APIs, or remote requests are used by this plugin. All analytics data is stored locally in your WordPress database. Chart.js is bundled locally — no CDN requests are made.

Privacy Policy

Crawlantix AI Bot Tracker is designed with privacy as a core principle:

  • Bot traffic only. The plugin only tracks automated bot traffic identified by User-Agent strings. Human visitors are not tracked and no cookies are set.
  • No raw IP addresses stored. All IP addresses are SHA-256 hashed with a per-install random salt before storage (or AUTH_SALT when defined in wp-config.php). The original IP address cannot be recovered from the hash. Note: pseudonymous IP hashes may still be considered personal data under GDPR.
  • Data stored per bot visit: IP hash, User-Agent string, requested URL, HTTP referrer URL, request method, timestamp, and derived fields (bytes transferred, bot verification status). Referrer URLs may contain personal data depending on the referring site.
  • WordPress Privacy API integration. The plugin registers exporter and eraser callbacks with WordPress core, so data-subject access and erasure requests filed through Tools Personal Data flow correctly.
  • No external data transmission. All analytics data remains in your local WordPress database. No data is sent to Crawlantix or third-party services. The only external communication is DNS lookups for bot verification (see External Services above).
  • Data retention controls. Bot visit data is automatically pruned after 30 days. Administrators can delete all collected data on uninstall via the “Delete Data on Uninstall” Settings toggle.

For sites that require a formal privacy policy disclosure, you may note: “We use the Crawlantix AI Bot Tracker plugin to monitor automated AI bot traffic to our site. This plugin records bot User-Agent strings, pseudonymous IP hashes, pages visited, referrer URLs, and timestamps for detected bot traffic only. Raw IP addresses are cryptographically hashed before storage. No human visitor data is collected.”

Premium Version

Crawlantix also offers paid tiers at crawlantix.com for site owners who need active bot defense in addition to monitoring. The paid build adds the following features on top of everything in this free version:

Protect tier

  • Active honeypot responses: HTTP 403 block, tarpit (random 5–25s delay with worker-exhaustion safeguards), rate limit 429, decoy content, shadowban.
  • Auto-block of repeat honeypot offenders, with configurable thresholds.
  • Per-IP response rules — apply a chosen response strategy to specific IP hashes (up to 200 rules).
  • Custom honeypot paths (up to 5) with a reserved-route safety list.
  • Email alerts for honeypot hits and parameter explosion patterns.
  • Robots.txt trap entries that catch non-compliant scrapers.
  • Optional override that suppresses the WordPress core /wp/v2/users REST endpoints (username-enumeration hardening, off by default and easy to opt back in).

Optimize tier

  • Full REST API at /wp-json/ai-tracker/v1/ with API key authentication and 13 endpoints (status, stats, page, trends, bots, top-pages, report, export, alerts, honeypot, crawled-pages, etc.).
  • GeoIP location tracking with MaxMind GeoLite2.
  • Crawl Analytics tab with deeper traffic-quality metrics.
  • Extended data retention up to 365 days.

Scale tier

  • Backup & restore — export all data as JSON; import with merge or replace modes.
  • Unlimited retention.
  • Priority support.

The paid build is a drop-in upgrade: same plugin slug, same database tables, same option keys, so all your historical bot data carries over with no migration step on your part.

Installation

  1. Upload the crawlantix-ai-bot-tracker folder to /wp-content/plugins/.
  2. Activate via Plugins Installed Plugins.
  3. Navigate to AI Bot Analytics in the WordPress admin menu.

The plugin works immediately after activation with zero configuration.

FAQ

Does this slow down my site?

No. The detection code exits immediately if the User-Agent is not an AI bot. Human visitor performance is unaffected.

Does this store IP addresses?

No. IP addresses are SHA-256 hashed (with a per-install salt) before storage. You cannot recover the original IP from the hash.

Does it work on multisite?

Yes. The plugin uses $wpdb->prefix everywhere and namespaces all options with crawlantix_. Note: on network-activated multisite, uninstalling the plugin only cleans data for the main site. Sub-site data must be removed individually via the “Delete Data on Uninstall” toggle on each sub-site before deactivation.

Does the free version actively block bots?

No. The free Monitor tier is intentionally logging-only — it observes and reports, but does not actively respond. Active defense (HTTP 403 block, tarpit, rate-limit 429, decoy content, shadowban, automatic blocking of repeat offenders, per-IP response rules) is part of the paid Crawlantix Protect tier — see the Premium Version section below.

How do I add a custom bot?

Use the crawlantix_bots filter:

`php

add_filter( ‘crawlantix_bots’, function( $bots ) {
$bots[‘MyCustomBot’] = [ ‘name’ => ‘My Bot’, ‘provider’ => ‘custom’ ];
return $bots;
} );
`

How do I publish my own llms.txt content?

Create a published WordPress page with the slug llms-txt (for /llms.txt) or llms-full-txt (for /llms-full.txt). The plugin will serve your page’s content as plain text at the well-known URL. AI agents and humans alike will see whatever you author there.

Reviews

There are no reviews for this plugin.

Contributors & Developers

“Crawlantix AI Bot Tracker” is open source software. The following people have contributed to this plugin.

Contributors

Changelog

2.0.5

  • Monitor-tier purification: this build is now intentionally focused on observability only.
  • New: WordPress Privacy API exporter and eraser hooks for data-subject access/erasure requests.
  • New: Per-install random salt for IP hashing on activation (replaces the brute-forceable site-URL fallback when AUTH_SALT is not defined).
  • New: IPv6 normalization before hashing (prevents inflated unique-IP counts from equivalent representations).
  • New: Schema migration is now re-run after upgrader_process_complete (closes the auto-update window where front-end bot traffic could write to columns that did not yet exist).
  • Improved: /ai-plugin.json is now a self-describing Monitor-tier badge with no advertised API surface — Cache-Control caching on the manifest and on /llms.txt.
  • Improved: Settings-saved success notice is driven by a per-user user_meta flash flag (prevents misleading notices triggered by a crafted query-string link, and is robust on hosts where caching plugins evict transients).

2.0.4

  • Safety: Uninstall no longer deletes data by default. New “Delete Data on Uninstall” toggle in Settings (default OFF). Prevents accidental data loss during plugin updates or reinstalls.

2.0.3

  • Security: Replace parse_str() with safe manual query parser to prevent nested array memory exhaustion.
  • Security: Add CSV formula injection sanitization.
  • Privacy: Redact IP hashes to first 8 characters in admin views.
  • Privacy: Honeypot link text now uses esc_html() for consistent output escaping.
  • Performance: Add composite database index (created_at, page_url) for pagination queries.
  • Performance: Cache blocked IP list in wp_cache (paid build only).
  • Performance: Cache url_to_postid() results with 5-minute TTL to reduce DB load under bot floods.

2.0.2

  • Security: Default to REMOTE_ADDR only for IP detection (no proxy header trust by default).
  • Performance: Batch sparkline query eliminates N+1 per-bot hourly data calls.
  • Performance: Migration lock wrapped in try/finally for guaranteed release.
  • Add xAI-Grok and PhindBot to bot detection list.

2.0.1

  • Security: Bot verification moved to shutdown hook (async, non-blocking).
  • Performance: N+1 query elimination via batch get_posts() for page lookups.
  • Performance: 4 new database indexes for common query patterns.
  • Detection: 12 new bots added with correct priority ordering.
  • Detection: IPv6 FCrDNS support with address normalization.
  • Data: http_referrer and request_method columns added for SEO analysis.

2.0.0

  • Expand bot detection from 15 to 30+ bots.
  • Add reverse DNS verification data for major bots.
  • Add bot categories.
  • Add 15 new provider colors in admin chart.

1.0.8

  • Add honeypot email alerts (paid-only feature) — configurable threshold, de-duped via transient.

1.0.7

  • Deployment testing and validation. No code changes from 1.0.6.

1.0.6

  • Add an optional filter to block the WordPress core REST endpoints /wp/v2/users and /wp/v2/users/{id} (off by default in the free build).

1.0.5

  • Add bytes transferred tracking per request (via shutdown hook).
  • Add unique URLs / base paths ratio metric for brute-force detection.
  • Add 24h sparkline SVGs per bot (server-rendered, no JS library).
  • Add bot categorization (ai_crawler, ai_search, ai_assistant, search_engine, archiver).
  • Add honeypot endpoint (/_ai-honeypot/) — CSS-hidden, aria-hidden, nofollow link in footer.
  • Add reverse DNS verification (FCrDNS) for Googlebot, GPTBot, ClaudeBot, and others.
  • Add bot categories JSON data file.
  • Add verified bots JSON data file.
  • Database migration: 5 new columns + alerts table.

1.0.4

  • Serve llms.txt and llms-full.txt from WordPress pages with matching slugs.
  • Add rewrite rules and direct request handlers for llms.txt endpoints.

1.0.3

  • Add admin menu as top-level page with dashicons-chart-area icon.

1.0.2

  • Add optional manifest_contact_email setting — when set, included in ai-plugin.json; omitted when blank.
  • Add optional manifest_legal_info_url setting — overrides the auto-detected privacy policy URL in ai-plugin.json.

1.0.1

  • Serve ai-plugin.json reliably without requiring a manual permalink flush after first install.
  • Use site-specific manifest contact email and legal URL values.

1.0.0

  • Initial release — 15 AI bots tracked, Chart.js admin dashboard, ai-plugin.json discovery.