CS CrawlerSignal

Pre-launch crawler check

Find out if your site is blocking search before traffic drops

Paste a URL. CrawlerSignal checks the public files search crawlers read, explains what is blocked in plain language, and gives you the first fix to review before you publish.

What are you worried about?
Human verification Checking protection...

Free beta. No account. Read-only public check. It cannot modify your site.

How it works

You do not need every acronym first

01

Pick Balanced if unsure

It keeps useful discovery open while handling training crawlers separately. Switch modes only when you know the tradeoff you want.

02

Read recommendations before score

The score is a summary. The numbered recommendations tell you what to fix first and why it matters.

03

Review drafts before publishing

robots.txt can affect search traffic. Treat generated drafts as a starting point, not a final policy.

Signal score -- Run an audit to check search visibility and AI crawler access.
robots.txtwaiting
llms.txtwaiting
sitemap.xmlwaiting
homepagewaiting

Crawler matrix

Keep useful discovery separate from unwanted crawling

Bot Company Use Status Rule
Run an audit to populate crawler rules.

Policy kit

Review the drafts before you publish

robots.txt snippet


          

llms.txt draft


          

audit.json


          

What to do after copying

  1. Confirm the goal: more search exposure or stronger content control.
  2. Ask the person responsible for SEO, CMS, CDN, Cloudflare, or server config to review the draft.
  3. After publishing, open https://yourdomain.com/robots.txt and confirm the public file changed.
  4. Resubmit your sitemap in Google Search Console and watch impressions over the next few days.

Plain terms

What these files and bots mean

Search visibility

Whether people can discover your public pages through search engines or AI search tools.

robots.txt

A public rule file at your site root that tells crawlers which areas they may or may not request.

Allow / Disallow

Allow means crawlers may request a path. Disallow: / usually means do not crawl the whole site.

sitemap.xml

A list of important URLs that helps search engines discover and revisit pages.

llms.txt

An optional context map for AI tools. It does not block crawlers or guarantee ranking.

Discovery vs training bots

Discovery bots help users find pages. Training bots are closer to model or dataset use. Decide on them separately.

Guides

Turn crawler confusion into checklists

FAQ

The boring caveats that keep this useful

Does llms.txt guarantee AI search ranking?

No. Treat it as an experimental AI-readable site map. The crawler rules that actually express allow/block choices still live in robots.txt.

Which policy mode should I choose?

If you are unsure, choose Balanced. It keeps useful search discovery open while making a separate decision for training-oriented AI crawlers. Choose Visibility when discovery matters most, and Protect when content control matters more than exposure.

What should I do after the audit?

Start with the numbered recommendations. Usually the order is: confirm the homepage can be fetched, make sure search crawlers are not blocked, add or fix sitemap.xml, then decide whether llms.txt is worth maintaining.

Why does a normal site sometimes show a medium score?

A medium score usually means the site is missing explicit crawler policy files, not that the site is unsafe. CrawlerSignal checks whether useful search discovery stays open while unwanted AI crawler access is handled clearly.

Should I block GPTBot and allow OAI-SearchBot?

That is the balanced default: keep ChatGPT search discovery open while making a separate training crawl choice. Review it with your legal and content strategy constraints.

What data does CrawlerSignal track?

The beta records lightweight product events such as page views, scan starts, scan success or failure, copy actions, and downloads. It does not create accounts or save a scan history database.

Can CrawlerSignal see Cloudflare managed robots.txt or server logs?

No. It reads public URLs from the outside. CDN rules, WAF settings, and real bot traffic logs require platform access and belong in a later paid monitoring product.