Millions of businesses are unknowingly blocking ChatGPT from reading their websites. A single line in robots.txt is all it takes. Here's how to check if you're affected โ and how to fix it in under five minutes.
When someone asks ChatGPT a question like "what's the best project management software for small teams", ChatGPT generates its answer based on content it has indexed from the web. If your site blocks its crawler, your business simply doesn't exist in that conversation โ no matter how good your product is.
This isn't a minor technical issue. As AI search grows, the sites that get cited by ChatGPT, Perplexity and Gemini are building brand visibility that compounds over time. The ones that don't are being left out entirely.
โ ๏ธ Studies suggest over 25% of websites block at least one major AI crawler โ many unknowingly, by adding overly broad rules to their robots.txt.
Every website has a robots.txt file that tells crawlers what they can and can't access. Visit yours right now:
https://yourwebsite.com/robots.txt
You're looking for any of these patterns:
User-agent: GPTBot
Disallow: /
User-agent: *
Disallow: /
The second example โ User-agent: * followed by Disallow: / โ is particularly dangerous. It blocks every bot, including all AI crawlers, every search engine, and all other automated tools.
Here are the major AI crawlers and what they power:
| Bot Name | Powers | You Want This |
|---|---|---|
GPTBot | ChatGPT (OpenAI) | โ Allowed |
ChatGPT-User | ChatGPT browsing | โ Allowed |
ClaudeBot | Claude (Anthropic) | โ Allowed |
anthropic-ai | Claude training | โ Allowed |
PerplexityBot | Perplexity AI | โ Allowed |
Google-Extended | Google Gemini + AI Overviews | โ Allowed |
cohere-ai | Cohere AI | โ Allowed |
Here's a clean robots.txt that allows all major search engines and AI crawlers while keeping your private areas protected:
# Allow all search engines
User-agent: Googlebot
Allow: /
User-agent: Bingbot
Allow: /
# Allow AI crawlers
User-agent: GPTBot
Allow: /
User-agent: ChatGPT-User
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: anthropic-ai
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: Google-Extended
Allow: /
User-agent: cohere-ai
Allow: /
# Allow everything else by default
User-agent: *
Allow: /
# Protect private areas
Disallow: /admin/
Disallow: /api/private/
Disallow: /wp-admin/
Sitemap: https://yourwebsite.com/sitemap.xml
๐ก Replace /admin/, /api/private/ and /wp-admin/ with the actual paths you want to protect. Everything else should be open.
Allowing AI bots to crawl your site is the first step โ but it's not enough on its own. For ChatGPT and Perplexity to actually cite you, your content needs to be:
AI models prefer content that makes clear, quotable statements. Instead of vague marketing language ("we provide world-class solutions"), write directly ("SiteOracle is a website audit tool that scores your site across SEO, AEO, GEO and AI Visibility").
JSON-LD structured data (Organization, Article, FAQPage, Product) helps AI understand what your content is about and who it's from. Without schema, AI has to guess.
AI models cite sources they trust. External mentions โ press coverage, backlinks from industry sites, social proof โ all contribute to how likely AI is to reference you.
Checking robots.txt manually is a one-time fix. But sites update their robots.txt, plugins sometimes add blocking rules automatically, and new AI bots emerge that you'd want to allow.
SiteOracle automatically checks all of this every time you run a scan โ it reads your robots.txt, identifies every blocked AI crawler, and gives you a specific fix with the exact lines to add.
SiteOracle scans your robots.txt, structured data, and content citability โ and tells you exactly which AI bots can find you and which ones you're blocking.
Run Free AI Visibility Check โGPTBot is OpenAI's web crawler. It crawls websites to gather content that may be used to train and improve ChatGPT. If you block GPTBot in your robots.txt, ChatGPT is less likely to reference or cite your content in its responses.
Visit yoursite.com/robots.txt in your browser. Look for lines that say "User-agent: GPTBot" followed by "Disallow: /". If you see that, you're blocking ChatGPT. You can also use SiteOracle to run an automatic AI Visibility check that flags all blocked bots instantly.
Allowing GPTBot is necessary but not sufficient. ChatGPT also considers the quality, authority, and citability of your content. You need structured data, clear definition statements, topical depth, and external references pointing to your site.
For maximum AI search visibility, allow: GPTBot (ChatGPT), ClaudeBot (Claude/Anthropic), PerplexityBot (Perplexity), Google-Extended (Google AI Overviews), anthropic-ai, cohere-ai, and ChatGPT-User. SiteOracle checks all of these automatically.