AI file guide
llms.txt vs robots.txt
The two files solve different problems. robots.txt expresses crawler access rules. llms.txt is a proposed Markdown file for giving AI systems a curated map of useful site content.
The short version
Do not use llms.txt as a blocking mechanism. Put allow and disallow rules in robots.txt. Use llms.txt only as an experimental context layer that describes important pages and clean Markdown resources.
What each file does
| File | Primary job | What it is not |
|---|---|---|
/robots.txt |
Tell crawlers which URLs they may request. | A guaranteed way to remove a page from search results. |
/llms.txt |
Provide a concise Markdown guide and links to high-signal resources. | A ranking guarantee, access-control protocol, or replacement for robots.txt. |
/sitemap.xml |
List indexable pages for search engines. | A curated explanation of which pages are best for AI context. |
A minimal llms.txt draft
This is useful for a product or documentation site, but it should sit beside clear robots.txt rules rather than replacing them.
# Your Site > One sentence explaining what the site is and who it serves. ## Core pages - [Product overview](https://example.com/): Short description of the main product. - [Documentation](https://example.com/docs): The best starting point for technical users. - [Pricing](https://example.com/pricing): Current plans and limits. ## Optional - [Blog](https://example.com/blog): Secondary articles and updates.
How CrawlerSignal uses this
CrawlerSignal checks whether a site has robots.txt, llms.txt, and sitemap.xml, then explains whether the files are doing their separate jobs. Missing llms.txt is not automatically high risk; confusing it with crawler access control is the bigger issue.