Cloudflare crawler controls

Cloudflare managed robots.txt can change what crawlers see

If your site uses Cloudflare, the public /robots.txt response may include managed AI crawler rules in addition to, or instead of, the file your origin server serves.

The short version

Do not audit only the file in your repo or CMS. Audit the live /robots.txt response. Cloudflare can prepend managed rules for known AI crawlers, create a robots file when your origin has none, and expose content signals that newer tools may understand differently from older SEO validators.

What Cloudflare may add

Situation	What can happen	Why it matters
Origin has a robots.txt file	Cloudflare may prepend managed AI crawler rules before your existing file.	Your repo file and the public crawler policy can differ.
Origin has no robots.txt file	Cloudflare may generate one for known AI crawler preferences.	A site can appear to have crawler policy even if the origin has none.
Content Signals are enabled	The live file may include usage signals for search, AI input, or AI training.	Some validators may warn even when Search crawling is not harmed.

The audit order I would use

Open the live /robots.txt URL in a browser, not just your source file.
Look for a Cloudflare managed content marker or AI-specific user-agent blocks.
Confirm whether the site owner intended to block training crawlers, search crawlers, or both.
Check sitemap discovery separately; managed robots rules do not replace a clean sitemap.
Use Cloudflare AI Crawl Control or WAF rules when you need enforcement, not just a preference file.

A policy that is clear but not overbroad

For many SaaS, docs, and publisher sites, the goal is not "block every AI-looking thing." It is usually more specific: keep normal search discovery open, make training preferences explicit, and avoid accidentally blocking user-triggered fetchers that help people find or summarize your public pages.

User-agent: Googlebot
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: GPTBot
Disallow: /

Sitemap: https://www.example.com/sitemap.xml

How CrawlerSignal uses this

CrawlerSignal fetches the live robots response because that is what crawlers see. If Cloudflare is changing the public file, the audit should reflect the public policy, not just the origin file. The score should still be interpreted as a checklist, not a verdict.

Check your live robots.txt