Title: bbole's Replies | WordPress.org

---

# bbole

  [  ](https://wordpress.org/support/users/bbole/)

 *   [Profile](https://wordpress.org/support/users/bbole/)
 *   [Topics Started](https://wordpress.org/support/users/bbole/topics/)
 *   [Replies Created](https://wordpress.org/support/users/bbole/replies/)
 *   [Reviews Written](https://wordpress.org/support/users/bbole/reviews/)
 *   [Topics Replied To](https://wordpress.org/support/users/bbole/replied-to/)
 *   [Engagements](https://wordpress.org/support/users/bbole/engagements/)
 *   [Favorites](https://wordpress.org/support/users/bbole/favorites/)

 Search replies:

## Forum Replies Created

Viewing 6 replies - 1 through 6 (of 6 total)

 *   Forum: [Plugins](https://wordpress.org/support/forum/plugins-and-hacks/)
    In
   reply to: [[Website LLMs.txt] How to handle potential conflict: duplicated content](https://wordpress.org/support/topic/how-to-handle-potential-conflict-duplicated-content/)
 *  Thread Starter [bbole](https://wordpress.org/support/users/bbole/)
 * (@bbole)
 * [12 months ago](https://wordpress.org/support/topic/how-to-handle-potential-conflict-duplicated-content/#post-18528284)
 * Absolutely. While discoverability refers to a file being found by a crawler (
   e.g., via links, sitemaps, or direct access), indexation is indeed a prerequisite
   for a page to appear in search engine results, which is a key component of discoverability
   by traditional search engines. The llms.txt specification is designed specifically
   for large language models (LLMs), not for traditional search engine indexing.
   Including these files in the sitemap index _**increases the likelihood **_that
   Googlebot will crawl and index them, treating them as regular web pages.
   Since
   these files (especially .md versions of blog posts) often have content identical
   or near-identical to their HTML versions, Google may flag them as duplicate content.
   This **_could_** dilute our SEO rankings, confuse search algorithms, or even 
   confuse people if they ever click on such results or lead to penalties in Google
   Search Console, as it may interpret the .md files as alternate versions of the
   same page without proper canonicalization.
 * By excluding llms.txt, llms-full.txt, and related .md files from the sitemap 
   index, we reduce the risk of traditional search engines **_indexing_** them, 
   thereby minimizing their _**discoverability**_ in search results. 
   This aligns
   with the llms.txt proposal, which intends these files to be accessed directly
   by LLMs or AI agents (e.g., via [https://example.com/llms.txt](https://example.com/llms.txt))
   rather than surfaced in Google’s search results. LLMs don’t rely on sitemaps 
   for discovery.So:– Excluding from sitemap indexes– Testing using robots.txt rules
   to allow specific AI crawlers while disallowing traditional crawlers.– Testing
   canonicalsReduce the likelihood of anything described above happening. To me,
   the first step is very straightforward: the plugin shouldn’t modify the sitemap
   index at all. Risk mitigation.
 *   Forum: [Plugins](https://wordpress.org/support/forum/plugins-and-hacks/)
    In
   reply to: [[Website LLMs.txt] How to handle potential conflict: duplicated content](https://wordpress.org/support/topic/how-to-handle-potential-conflict-duplicated-content/)
 *  Thread Starter [bbole](https://wordpress.org/support/users/bbole/)
 * (@bbole)
 * [12 months ago](https://wordpress.org/support/topic/how-to-handle-potential-conflict-duplicated-content/#post-18528237)
 * Most people have sitemap index added to Google Search Console, etc. After activating
   this plugin, there’s a new sitemap for llms added to that sitemap, when it shouldn’t
   be. That’s why it’s picked by GSC and other webmasters tools. 
   I have deactivated
   the plugin for that reason.
    -  This reply was modified 12 months ago by [bbole](https://wordpress.org/support/users/bbole/).
 *   Forum: [Plugins](https://wordpress.org/support/forum/plugins-and-hacks/)
    In
   reply to: [[Website LLMs.txt] Why no single post .md files are generated?](https://wordpress.org/support/topic/why-no-single-post-md-files-are-generated/)
 *  [bbole](https://wordpress.org/support/users/bbole/)
 * (@bbole)
 * [12 months ago](https://wordpress.org/support/topic/why-no-single-post-md-files-are-generated/#post-18528018)
 * Me too, I was expecting, as per Jeremy Howard webpage on this, 2 files:
   **llms.
   txt** – The main index file that goes in our website’s root directory (`/llms.
   txt`). A structured markdown file providing:
    - A project overview
    - Links to detailed markdown files
    - Organized sections of resources
 * **llms-full.txt** (or similar) – This is a processed/expanded version containing
   the actual content from all the URLs referenced in llms.txt.
   We need then to 
   create md versions of each blog article first before actually generating a llms.
   txt and a llms-full.txt = each article having a corresponding .md path/alternate(
   The [specs](https://llmstxt.org/) suggests that each HTML page should have a 
   corresponding `.md` version at the same URL with `.md` appended (e.g., `page.
   html` → `page.html.md`)) The plugin doesn’t generate these single md files.Ref:
   [https://jina.ai/reader/](https://jina.ai/reader/)It’s actually the llms.txt 
   what works like the ‘traditional’ sitemap, but without calling it sitemap. It’s
   a file that references the markdown versions you’ve created beforehand and provides
   structureAnd the “full” file should be a concatenated version of all the content
   in those md pages.
 *   Forum: [Plugins](https://wordpress.org/support/forum/plugins-and-hacks/)
    In
   reply to: [[Website LLMs.txt] How to handle potential conflict: duplicated content](https://wordpress.org/support/topic/how-to-handle-potential-conflict-duplicated-content/)
 *  Thread Starter [bbole](https://wordpress.org/support/users/bbole/)
 * (@bbole)
 * [12 months ago](https://wordpress.org/support/topic/how-to-handle-potential-conflict-duplicated-content/#post-18527989)
 * Why is such sitemap-llms created after all? it’s not part of the procedure suggested
   by Jeremy Howard [https://llmstxt.org/](https://llmstxt.org/)
 *   Forum: [Plugins](https://wordpress.org/support/forum/plugins-and-hacks/)
    In
   reply to: [[Website LLMs.txt] How to handle potential conflict: duplicated content](https://wordpress.org/support/topic/how-to-handle-potential-conflict-duplicated-content/)
 *  Thread Starter [bbole](https://wordpress.org/support/users/bbole/)
 * (@bbole)
 * [12 months ago](https://wordpress.org/support/topic/how-to-handle-potential-conflict-duplicated-content/#post-18527955)
 * Also [@ryhowa](https://wordpress.org/support/users/ryhowa/) asking because [https://beebole.com/blog/sitemap_index.xml](https://beebole.com/blog/sitemap_index.xml)
   is added to GSC, Bing Webmasters, etc and the new llms sitemap is there.
 *   Forum: [Plugins](https://wordpress.org/support/forum/plugins-and-hacks/)
    In
   reply to: [[Website LLMs.txt] How to handle potential conflict: duplicated content](https://wordpress.org/support/topic/how-to-handle-potential-conflict-duplicated-content/)
 *  Thread Starter [bbole](https://wordpress.org/support/users/bbole/)
 * (@bbole)
 * [12 months ago](https://wordpress.org/support/topic/how-to-handle-potential-conflict-duplicated-content/#post-18527929)
 * Hola Ryan,
 * Thanks a lot for responding! How do we know 1”% that Google’s and other ‘traditional’
   bots won’t index such files?
   Also regarding this plugin: does it generate a markdown
   version of each article too?Thank YOU

Viewing 6 replies - 1 through 6 (of 6 total)