AI Search May 4, 2026  ยท  6 min read

How to Check If ChatGPT Can Find Your Website

Millions of businesses are unknowingly blocking ChatGPT from reading their websites. A single line in robots.txt is all it takes. Here's how to check if you're affected โ€” and how to fix it in under five minutes.

Why This Matters

When someone asks ChatGPT a question like "what's the best project management software for small teams", ChatGPT generates its answer based on content it has indexed from the web. If your site blocks its crawler, your business simply doesn't exist in that conversation โ€” no matter how good your product is.

This isn't a minor technical issue. As AI search grows, the sites that get cited by ChatGPT, Perplexity and Gemini are building brand visibility that compounds over time. The ones that don't are being left out entirely.

โš ๏ธ Studies suggest over 25% of websites block at least one major AI crawler โ€” many unknowingly, by adding overly broad rules to their robots.txt.

Step 1: Check Your robots.txt

Every website has a robots.txt file that tells crawlers what they can and can't access. Visit yours right now:

https://yourwebsite.com/robots.txt

You're looking for any of these patterns:

User-agent: GPTBot
Disallow: /

User-agent: *
Disallow: /

The second example โ€” User-agent: * followed by Disallow: / โ€” is particularly dangerous. It blocks every bot, including all AI crawlers, every search engine, and all other automated tools.

Step 2: Know Which Bots to Allow

Here are the major AI crawlers and what they power:

Bot NamePowersYou Want This
GPTBotChatGPT (OpenAI)โœ“ Allowed
ChatGPT-UserChatGPT browsingโœ“ Allowed
ClaudeBotClaude (Anthropic)โœ“ Allowed
anthropic-aiClaude trainingโœ“ Allowed
PerplexityBotPerplexity AIโœ“ Allowed
Google-ExtendedGoogle Gemini + AI Overviewsโœ“ Allowed
cohere-aiCohere AIโœ“ Allowed

Step 3: Fix Your robots.txt

Here's a clean robots.txt that allows all major search engines and AI crawlers while keeping your private areas protected:

# Allow all search engines
User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

# Allow AI crawlers
User-agent: GPTBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: anthropic-ai
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: cohere-ai
Allow: /

# Allow everything else by default
User-agent: *
Allow: /

# Protect private areas
Disallow: /admin/
Disallow: /api/private/
Disallow: /wp-admin/

Sitemap: https://yourwebsite.com/sitemap.xml

๐Ÿ’ก Replace /admin/, /api/private/ and /wp-admin/ with the actual paths you want to protect. Everything else should be open.

Step 4: Go Beyond robots.txt

Allowing AI bots to crawl your site is the first step โ€” but it's not enough on its own. For ChatGPT and Perplexity to actually cite you, your content needs to be:

Easy to extract

AI models prefer content that makes clear, quotable statements. Instead of vague marketing language ("we provide world-class solutions"), write directly ("SiteOracle is a website audit tool that scores your site across SEO, AEO, GEO and AI Visibility").

Structured with schema

JSON-LD structured data (Organization, Article, FAQPage, Product) helps AI understand what your content is about and who it's from. Without schema, AI has to guess.

Authoritative

AI models cite sources they trust. External mentions โ€” press coverage, backlinks from industry sites, social proof โ€” all contribute to how likely AI is to reference you.

How to Automate This Check

Checking robots.txt manually is a one-time fix. But sites update their robots.txt, plugins sometimes add blocking rules automatically, and new AI bots emerge that you'd want to allow.

SiteOracle automatically checks all of this every time you run a scan โ€” it reads your robots.txt, identifies every blocked AI crawler, and gives you a specific fix with the exact lines to add.

Check Your AI Visibility Now

SiteOracle scans your robots.txt, structured data, and content citability โ€” and tells you exactly which AI bots can find you and which ones you're blocking.

Run Free AI Visibility Check โ†’

Frequently Asked Questions

What is GPTBot?

GPTBot is OpenAI's web crawler. It crawls websites to gather content that may be used to train and improve ChatGPT. If you block GPTBot in your robots.txt, ChatGPT is less likely to reference or cite your content in its responses.

How do I check if my site blocks GPTBot?

Visit yoursite.com/robots.txt in your browser. Look for lines that say "User-agent: GPTBot" followed by "Disallow: /". If you see that, you're blocking ChatGPT. You can also use SiteOracle to run an automatic AI Visibility check that flags all blocked bots instantly.

Does allowing GPTBot guarantee ChatGPT will cite me?

Allowing GPTBot is necessary but not sufficient. ChatGPT also considers the quality, authority, and citability of your content. You need structured data, clear definition statements, topical depth, and external references pointing to your site.

Which AI bots should I allow in robots.txt?

For maximum AI search visibility, allow: GPTBot (ChatGPT), ClaudeBot (Claude/Anthropic), PerplexityBot (Perplexity), Google-Extended (Google AI Overviews), anthropic-ai, cohere-ai, and ChatGPT-User. SiteOracle checks all of these automatically.