Dynamic llms.txt Generator – WordPress Plugin Case Study

Project Background

As AI tools and LLM-powered workflows continue to grow, many developers and businesses are looking for better ways to provide structured website content to these systems. 

However, most websites are designed primarily for browsers, not for machine-readable content extraction. WordPress websites, in particular, contain a mix of HTML markup, scripts, styles, shortcodes, and content stored across multiple plugins. 

This makes it difficult for external tools to extract meaningful content without building complex crawlers. 

To address this challenge, we developed Dynamic llms.txt Generator, a lightweight WordPress plugin that automatically generates a structured llms.txt file containing clean and organized content from a website. 

Instead of requiring external tools to crawl multiple pages, the plugin provides a single structured text endpoint that represents the key content of the site. 

The Problem

Large WordPress websites such as blogs, product catalogs, or recipe sites often store content in several different places. Extracting meaningful information from them is not as simple as scraping page HTML.

Some of the common challenges include:

  • HTML scraping brings in unnecessary elements such as scripts, styles, and layout markup. 
  • A significant amount of content is stored in Advanced Custom Fields (ACF) instead of the main content editor. 
  • SEO metadata is scattered across different plugins and database tables. 
  • Crawling hundreds of URLs can be inefficient and slow. 
  • Generating exports dynamically on every request can impact performance. 

While there were already a few plugins available for generating llms.txt files in WordPress, most of them only included minimal information such as the page title, URL, and short description. 

During research, we came across discussions explaining a more comprehensive llms.txt structure that included headings, metadata, and cleaned content. Developers reported that this format worked well with LLM-based tools. 

This led us to build a WordPress plugin that generates richer, structured content exports while supporting the way modern WordPress websites are built. 

The Solution

Dynamic llms.txt Generator introduces a simple tool inside the WordPress admin panel that automatically generates a single llms.txt file containing selected site content.

Website administrators can configure:

Which content types should be included

How often the file should be regenerated

How much content should be exported per page

The plugin extracts content from posts, pages, products, and custom post types, cleans the text, organizes the information into a structured format, and saves the result as llms.txt in the website’s root directory.

External tools can then access the entire structured dataset through one endpoint – Yoursite

This approach eliminates the need for external systems to crawl multiple pages or parse HTML templates.

Key Features

Configurable Content Selection

Administrators can control exactly what content appears in the generated file.

The plugin allows users to:

  • Select which post types to include (posts, pages, products, etc.) 
  • Set limits for the number of entries per post type 
  • Restrict the word count for each page 
  • Arrange content order through drag-and-drop settings 

This makes the tool flexible enough for both small websites and large content platforms.

Dynamic llms.txt Generator – WordPress Plugin Case Study(content settings) - ColorWhistle

Structured Content Export

The generated llms.txt file includes clearly structured sections that make it easier for downstream tools to understand the content. 

The export includes: 

  • Generation timestamp 
  • Source sitemap URL
  • Total pages processed 
  • Basic site metadata 
  • Page titles and descriptions 
  • Canonical URLs and language information 
  • Author and publication details 
  • Categories, tags, and custom taxonomies 
  • WooCommerce product metadata (when applicable) 

Each page also includes: 

  • headings structure (H1–H6) extracted from the content 
  • cleaned main content section containing readable text without HTML markup 

ACF-Aware Content Extraction

Modern WordPress websites frequently rely on Advanced Custom Fields (ACF) to structure their content. 

The plugin automatically scans ACF fields and extracts readable content from them, including repeaters and flexible content layouts. This ensures that content stored outside the default WordPress editor is still included in the export. 

WooCommerce Compatibility

For websites running WooCommerce, the plugin can also extract product-related metadata such as:

  • Product price 
  • SKU 
  • Product type 
  • Product categories 

This allows eCommerce websites to include structured product information within the exported file.

Performance-Friendly Generation

Instead of generating the entire file on every request, the plugin uses a caching approach to store processed content.

The llms.txt file can be updated:

  • Immediately when content changes 
  • On a scheduled basis (daily or weekly) 
  • Manually from the admin panel 

Because the final output is served as a static text file, normal website traffic does not trigger heavy processing.

SEO and Sitemap Integration

The plugin integrates with popular SEO plugins such as Yoast SEO and Rank Math

When enabled, the llms.txt endpoint can be automatically added to the website sitemap, making it easier for automated systems to discover the file. 

The admin interface also provides tools to clear plugin caches and ensure the file stays synchronized with site content. 

Dynamic llms.txt Generator – WordPress Plugin Case Study(Cache Management) - ColorWhistle

Challenges

Exporting Large Content Sets Efficiently

Generating a full export for large websites could potentially slow down the system. 

To prevent this, the plugin uses a background processing approach and stores processed content before generating the final file. This keeps the generation process fast and avoids unnecessary database queries. 

Working with ACF-Based Page Structures

Many WordPress websites use flexible ACF layouts instead of standard content fields. 

To handle this, the plugin was designed to scan and process ACF fields recursively while ignoring layout configuration fields that do not contain visible content. 

Cleaning WordPress Content

Raw WordPress content often contains shortcodes, scripts, and layout elements that are not useful for LLM processing.

A multi-step cleaning process was implemented to remove unnecessary markup and ensure the final text remains readable.

Results

The Dynamic llms.txt Generator plugin provides several practical benefits for content-heavy WordPress websites.

A Single Structured Content Endpoint

External tools can access a website’s core content through one URL instead of crawling multiple pages.

Cleaner Data for LLM-Based Workflows

Content is exported in a simplified format that separates metadata, headings, and main content, making it easier for AI tools to process.

Predictable Website Performance

Since heavy processing happens during scheduled updates rather than during user requests, the plugin avoids performance issues on the live site.

Easy Administration

Everything can be managed directly from the WordPress dashboard without requiring additional infrastructure or external services.

Plugin Availability

After internal testing and real-world usage across multiple WordPress sites, the solution evolved into a publicly available WordPress plugin called Dynamic llms.txt Generator

Dynamic llms.txt Generator – WordPress Plugin Case Study(WordPress Plugin) - ColorWhistle

The plugin is now available in the official WordPress plugin repository and can be installed directly from the WordPress dashboard. 

Plugin page: https://wordpress.org/plugins/dynamic-llms-txt-generator/

Conclusion

Dynamic llms.txt Generator provides a simple yet powerful way for WordPress websites to expose their content in a format that is easier for AI systems and automated tools to consume. 

By combining structured exports, clean text extraction, and flexible configuration options, the plugin transforms a typical WordPress website into a machine-friendly content feed without affecting the normal browsing experience for users. 

Rajeev
About the Author - Rajeev

Rajeev is a WordPress developer and tech lead with more than 11 years of experience building high-performance websites across travel, education, real estate, and e-commerce. He focuses on speed, stability, and scalability, and enjoys creating API-driven solutions that help businesses extend their digital capabilities in smart and meaningful ways. He has a strong interest in integrating third-party systems and building custom functionality that supports long-term growth and real business outcomes. Outside of work, he is yet another Federer fan who dreams of someday watching Federer play at center court and he loves endurance cycling & playing badminton during his off time!

Ready to get started?

Let’s craft your next digital story

Our Expertise Certifications - ColorWhistle
Go to top
Close Popup

Let's Talk

    Leave your details and we’ll get back to you shortly.

    Eg: John Doe

    Eg: United States

    Eg: johndoe@company.com

    More the details, speeder the process :)