Umay AI Markdown

Description

Modern AI agents (ChatGPT, Claude, Perplexity, Gemini, etc.) work much better with Markdown than HTML. Umay AI Markdown inspects the incoming Accept header and, only when text/markdown is requested, intercepts the response and serves a clean, agent-friendly Markdown representation of the page.

Browsers, search engines, and any client that does not explicitly ask for Markdown receive the unchanged HTML response. There is no settings page, no cron job, and no external service call.

Key features

  • Zero configuration — install, activate, done.
  • Only triggers when Accept: text/markdown is present. Regular visitors and search engines are never affected.
  • Hybrid content extraction: uses the_content for posts/pages, falls back to a DOM-based extractor for archives, taxonomies, and the homepage.
  • Powered by the industry-standard league/html-to-markdown library.
  • Transient-cached for 12 hours per URL (sha256-keyed). Auto-invalidated on save_post, term edits, theme switches, and menu updates.
  • Built-in IP rate limiter (30 requests / minute by default) to mitigate abuse.
  • Strict input sanitization, header injection protection, libxml entity hardening (XXE-safe), and full WordPress Coding Standards compliance.
  • PSR-4 autoloaded, namespaced OOP code. No globals.
  • Sends Vary: Accept, X-Robots-Tag: noindex, and X-Content-Type-Options: nosniff on every Markdown response.

What gets sent to AI agents

Each Markdown response includes a YAML front-matter block with the page title, site name, canonical URL, and ISO-8601 generation timestamp, followed by the page body converted to Markdown. Navigation, footer, sidebars, scripts, styles, comment forms, related posts, and other page chrome are stripped before conversion.

Filters

Two filters are available for advanced customization:

  • umay_mdn_bypass — Return true to skip Markdown handling for the current request.
  • umay_mdn_cache_ttl — Override the default 12-hour cache lifetime (in seconds, minimum 60).
  • umay_mdn_rate_limit — Override the default 30-requests-per-minute rate limit.
  • umay_mdn_converter_options — Modify the league/html-to-markdown converter options array.

Screenshots

  • Markdown response served when an AI agent requests Accept: text/markdown — full HTTP headers (Content-Type, Vary, X-Cache, X-Robots-Tag, X-Content-Type-Options, X-Markdown-Generator), followed by a YAML frontmatter block (title, site, URL, generated timestamp) and the converted Markdown body.
  • Built-in transient cache in action: the first request to a URL returns X-Cache: MISS (Markdown generated on the fly), the next request to the same URL returns X-Cache: HIT (served from cache, no rendering work).
  • Transparent content negotiation: same WordPress URL, two different responses based on the Accept request header — browsers receive text/html (the normal theme output), AI agents receive text/markdown (clean, machine-readable content) with the plugin’s enriched header set.

Installation

  1. Upload the markdown-negotiator folder (or the ZIP) via Plugins > Add New > Upload Plugin.
  2. Activate the plugin.
  3. There is no settings page. The plugin starts working immediately.

To verify:

curl -H "Accept: text/markdown" https://your-site.com/

You should get back a Content-Type: text/markdown; charset=utf-8 response with a YAML front-matter and a Markdown body.

FAQ

Does it slow down my normal site?

No. The plugin returns immediately on template_redirect priority 1 if the request does not include Accept: text/markdown. The cost is roughly a single if check per request.

Does this plugin contact any external services?

No. The plugin does not call any external service, does not send analytics, does not check for updates against a remote server, and does not load any remote assets. All HTML-to-Markdown conversion happens locally using the bundled league/html-to-markdown library.

How is the cache invalidated?

Per-URL cache keys are deleted on save_post and transition_post_status for the affected post. Term edits, theme switches, and menu updates flush the entire Markdown cache.

Is logged-in personalization supported?

No. Logged-in requests always fall back to HTML to avoid leaking nonces, admin bars, or per-user content into a shared Markdown cache. This is by design.

How do I clear the cache manually?

Deactivate and re-activate the plugin. Deactivation calls Cache::flush_all().

Is the Markdown response indexable by search engines?

No. Every Markdown response includes X-Robots-Tag: noindex to prevent duplicate-content issues. The HTML version remains the canonical, indexable representation.

Does this work with caching plugins like LiteSpeed Cache, WP Rocket, or W3 Total Cache?

The plugin sets Vary: Accept so any well-behaved cache layer will store the Markdown and HTML variants separately. If your cache layer ignores the Vary header, exclude the URLs from the cache when the Accept: text/markdown header is present.

Reviews

There are no reviews for this plugin.

Contributors & Developers

“Umay AI Markdown” is open source software. The following people have contributed to this plugin.

Contributors

Translate “Umay AI Markdown” into your language.

Interested in development?

Browse the code, check out the SVN repository, or subscribe to the development log by RSS.

Changelog

1.1.1

  • Aligned text domain with WordPress.org plugin slug: changed from umay-ai-markdown to markdown-negotiator across plugin header, gettext calls, and POT file (per WP.org review feedback).
  • Hardened Markdown response output: replaced raw echo $markdown with wp_kses( $markdown, array() ) to satisfy Plugin Check’s late-escape rule while preserving Markdown syntax.
  • Removed bundled tr_TR translation files; only the English POT template ships in the WordPress.org distribution. Translations now flow through translate.wordpress.org.

1.1.0

  • Renamed plugin: “Markdown Negotiator” “Umay AI Markdown”; slug and text domain changed to markdown-negotiator (per WordPress.org review feedback to ensure the name is distinctive).
  • Removed deprecated libxml_disable_entity_loader() calls in ContentExtractor and Converter. XXE protection unchanged: libxml 2.9+ disables external entities by default, and LIBXML_NONET is still passed to loadHTML().
  • Refactored Tier 2 (non-singular) request pipeline: replaced the open ob_start( $callback ) on template_redirect with a template_include filter that opens and closes its buffer (ob_start paired with ob_get_clean) inside a single function scope, satisfying Plugin Check’s buffer-pairing requirement.
  • Added try/catch around the template render so a fatal in a theme/plugin returns a clean 500 instead of a half-buffered response.

1.0.1

  • HTML sanitization hardening: strip <style>, <script>, <noscript> blocks via regex (defense in depth on top of league/html-to-markdown’s remove_nodes).
  • Lazy-load image normalization: promotes data-lazy-src / data-src / data-original and the first srcset URL into the real src. Drops empty/placeholder images.
  • Page-builder anchor cleanup: anchors with href="#elementor-action:...", javascript:, or bare # are unwrapped to plain text.
  • Internationalization: text domain is now loaded on init and a base POT file ships in /languages/.

1.0.0

  • Initial release.