The directory
Engines change their agents more often than most sites update robots.txt; this table is reviewed quarterly (last review dated above).
| User agent | Operator | What it feeds | If you block it |
|---|---|---|---|
| GPTBot | OpenAI | Model training corpus | Future GPT models know less about you |
| OAI-SearchBot | OpenAI | ChatGPT search index | You drop out of ChatGPT search results |
| ChatGPT-User | OpenAI | Live fetches during user chats | ChatGPT cannot open your pages when asked |
| ClaudeBot | Anthropic | Training and index crawling | Future Claude models know less about you |
| Claude-Web | Anthropic | Live fetches during user chats | Claude cannot open your pages when asked |
| Anthropic-ai | Anthropic | Legacy training agent | Belt-and-suspenders companion to ClaudeBot |
| PerplexityBot | Perplexity | Answer-engine index | You vanish from Perplexity citations |
| Google-Extended | Gemini training (not Search) | Gemini training opt-out; Search unaffected | |
| Applebot-Extended | Apple | Apple Intelligence training | Same trade as Google-Extended, Apple edition |
| Bingbot | Microsoft | Bing index, feeds Copilot | You exit both Bing and Copilot answers |
| CCBot | Common Crawl | Open web corpus used by many labs | You leave the default dataset of new models |
| Bytespider | ByteDance | Model training | Known to ignore robots.txt at times; blocking is partly symbolic |
| cohere-ai | Cohere | Model training | Enterprise-model exposure, minor for most |
| Amazonbot | Amazon | Alexa and Amazon AI surfaces | Alexa-adjacent answers lose you |
| Meta-ExternalAgent | Meta | Meta AI training and retrieval | Meta AI surfaces know less about you |
GPTBot
- Operator
- OpenAI
- What it feeds
- Model training corpus
- If you block it
- Future GPT models know less about you
OAI-SearchBot
- Operator
- OpenAI
- What it feeds
- ChatGPT search index
- If you block it
- You drop out of ChatGPT search results
ChatGPT-User
- Operator
- OpenAI
- What it feeds
- Live fetches during user chats
- If you block it
- ChatGPT cannot open your pages when asked
ClaudeBot
- Operator
- Anthropic
- What it feeds
- Training and index crawling
- If you block it
- Future Claude models know less about you
Claude-Web
- Operator
- Anthropic
- What it feeds
- Live fetches during user chats
- If you block it
- Claude cannot open your pages when asked
Anthropic-ai
- Operator
- Anthropic
- What it feeds
- Legacy training agent
- If you block it
- Belt-and-suspenders companion to ClaudeBot
PerplexityBot
- Operator
- Perplexity
- What it feeds
- Answer-engine index
- If you block it
- You vanish from Perplexity citations
Google-Extended
- Operator
- What it feeds
- Gemini training (not Search)
- If you block it
- Gemini training opt-out; Search unaffected
Applebot-Extended
- Operator
- Apple
- What it feeds
- Apple Intelligence training
- If you block it
- Same trade as Google-Extended, Apple edition
Bingbot
- Operator
- Microsoft
- What it feeds
- Bing index, feeds Copilot
- If you block it
- You exit both Bing and Copilot answers
CCBot
- Operator
- Common Crawl
- What it feeds
- Open web corpus used by many labs
- If you block it
- You leave the default dataset of new models
Bytespider
- Operator
- ByteDance
- What it feeds
- Model training
- If you block it
- Known to ignore robots.txt at times; blocking is partly symbolic
cohere-ai
- Operator
- Cohere
- What it feeds
- Model training
- If you block it
- Enterprise-model exposure, minor for most
Amazonbot
- Operator
- Amazon
- What it feeds
- Alexa and Amazon AI surfaces
- If you block it
- Alexa-adjacent answers lose you
Meta-ExternalAgent
- Operator
- Meta
- What it feeds
- Meta AI training and retrieval
- If you block it
- Meta AI surfaces know less about you
The decision framework
The blanket question "should I block AI crawlers" hides three separate trades.
Trainers (GPTBot, ClaudeBot, Google-Extended, Applebot-Extended, CCBot, cohere-ai, Bytespider, Meta-ExternalAgent)
Indexers (OAI-SearchBot, PerplexityBot, Bingbot)
Fetchers (ChatGPT-User, Claude-Web)
Blocking a trainer is an IP position. Blocking a fetcher is hanging up on a buyer who just asked about you.
For a B2B company selling on expertise, the resolution is usually: allow everything, then make the crawlable surface excellent. That is the position this site takes, and the benchmark data shows it is now the norm: bot access is the highest-scoring section in the field, averaging 98/100. The control plane is solved; the differentiation moved to what crawlers find once inside.
Implementation notes
What to do next
Open your robots.txt now and check it against the table; if you cannot say which of the three kinds each rule affects, run the audit and let the bot-access section grade it for you.