CSCrawlerSignal

AI file guide

llms.txt vs robots.txt

The two files solve different problems. robots.txt expresses crawler access rules. llms.txt is a proposed Markdown file for giving AI systems a curated map of useful site content.

The short version

Do not use llms.txt as a blocking mechanism. Put allow and disallow rules in robots.txt. Use llms.txt only as an experimental context layer that describes important pages and clean Markdown resources.

What each file does

File Primary job What it is not
/robots.txt Tell crawlers which URLs they may request. A guaranteed way to remove a page from search results.
/llms.txt Provide a concise Markdown guide and links to high-signal resources. A ranking guarantee, access-control protocol, or replacement for robots.txt.
/sitemap.xml List indexable pages for search engines. A curated explanation of which pages are best for AI context.

A minimal llms.txt draft

This is useful for a product or documentation site, but it should sit beside clear robots.txt rules rather than replacing them.

# Your Site

> One sentence explaining what the site is and who it serves.

## Core pages
- [Product overview](https://example.com/): Short description of the main product.
- [Documentation](https://example.com/docs): The best starting point for technical users.
- [Pricing](https://example.com/pricing): Current plans and limits.

## Optional
- [Blog](https://example.com/blog): Secondary articles and updates.

How CrawlerSignal uses this

CrawlerSignal checks whether a site has robots.txt, llms.txt, and sitemap.xml, then explains whether the files are doing their separate jobs. Missing llms.txt is not automatically high risk; confusing it with crawler access control is the bigger issue.

Audit your site

Sources