Pick Balanced if unsure
It keeps useful discovery open while handling training crawlers separately. Switch modes only when you know the tradeoff you want.
Pre-launch crawler check
Paste a URL. CrawlerSignal checks the public files search crawlers read, explains what is blocked in plain language, and gives you the first fix to review before you publish.
How it works
It keeps useful discovery open while handling training crawlers separately. Switch modes only when you know the tradeoff you want.
The score is a summary. The numbered recommendations tell you what to fix first and why it matters.
robots.txt can affect search traffic. Treat generated drafts as a starting point, not a final policy.
Audit output
https://yourdomain.com/robots.txt and confirm the public file changed.Crawler matrix
| Bot | Company | Use | Status | Rule |
|---|---|---|---|---|
| Run an audit to populate crawler rules. | ||||
Policy kit
https://yourdomain.com/robots.txt and confirm the public file changed.Plain terms
Whether people can discover your public pages through search engines or AI search tools.
A public rule file at your site root that tells crawlers which areas they may or may not request.
Allow means crawlers may request a path. Disallow: / usually means do not crawl the whole site.
A list of important URLs that helps search engines discover and revisit pages.
An optional context map for AI tools. It does not block crawlers or guarantee ranking.
Discovery bots help users find pages. Training bots are closer to model or dataset use. Decide on them separately.
Guides
Separate model training from ChatGPT Search visibility before you edit robots.txt.
Google controls Google-ExtendedControl Gemini training and grounding use without confusing it with Google Search indexing.
AI files llms.txt vs robots.txtUse robots.txt for access rules and llms.txt as an experimental context map.
Reality check Is llms.txt worth it?Decide when the file helps, when it is hype, and what to audit first.
Cloudflare controls Managed robots.txtCheck the live robots.txt response before you assume your origin file is the full policy.
FAQ
No. Treat it as an experimental AI-readable site map. The crawler rules that actually express allow/block choices still live in robots.txt.
If you are unsure, choose Balanced. It keeps useful search discovery open while making a separate decision for training-oriented AI crawlers. Choose Visibility when discovery matters most, and Protect when content control matters more than exposure.
Start with the numbered recommendations. Usually the order is: confirm the homepage can be fetched, make sure search crawlers are not blocked, add or fix sitemap.xml, then decide whether llms.txt is worth maintaining.
A medium score usually means the site is missing explicit crawler policy files, not that the site is unsafe. CrawlerSignal checks whether useful search discovery stays open while unwanted AI crawler access is handled clearly.
That is the balanced default: keep ChatGPT search discovery open while making a separate training crawl choice. Review it with your legal and content strategy constraints.
The beta records lightweight product events such as page views, scan starts, scan success or failure, copy actions, and downloads. It does not create accounts or save a scan history database.
No. It reads public URLs from the outside. CDN rules, WAF settings, and real bot traffic logs require platform access and belong in a later paid monitoring product.