Skip to content
GEO AI
AnalyzerCLIDocumentationSpecificationContact
Documentation

Getting Started

  • Introduction
  • Quick Start
  • Choose Your Package

GEO Specification

  • Overview
  • llms.txt
  • AI Metadata
  • Crawler Rules
  • Structured Signals
  • Scoring
  • Recommendations

Packages

  • GEO AI Core
  • GEO AI Next
  • GEO AI Woo
  • GEO AI Shopify

Analyzer

  • Overview
  • Scoring
  • Recommendations

CLI

  • GEO AI CLI

Integrations

  • NestJSsoon
  • Laravelsoon

Reference

  • Configuration
  • API Reference
  • FAQ
DocsSpecificationCrawler Rules

Crawler Rules Specification

The robots.txt directives that control which AI search engines can crawl your site — ten known AI crawlers, their user-agent strings, and the access posture the Analyzer expects.

AI crawlers

The Analyzer tracks ten AI crawlers across the major AI search platforms. Each crawler is identified by its User-agent string in robots.txt. The table below lists all ten with their provider and the AI engine they serve.

User-agentProviderAI Engine
GPTBotOpenAIChatGPT / GPT-4o
ChatGPT-UserOpenAIChatGPT browsing
ClaudeBotAnthropicClaude
Claude-WebAnthropicClaude web search
PerplexityBotPerplexity AIPerplexity
Google-ExtendedGoogleGemini / AI Overviews
CCBotCommon CrawlTraining data (multiple)
BytespiderByteDanceDoubao / Grok training
AmazonbotAmazonAlexa / Amazon Q
meta-externalagentMetaMeta AI

User-agent matching is case-insensitive

The Analyzer normalizes user-agent strings to lowercase before matching. A robots.txt entry for GPTBot, gptbot, or Gptbot all match the same crawler.

robots.txt format

To explicitly allow all ten AI crawlers, add a User-agent block for each one with an Allow: / directive. Place these blocks before any wildcard User-agent: * block to ensure they take precedence.

robots.txt
User-agent: GPTBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: Claude-Web
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: CCBot
Allow: /

User-agent: Bytespider
Allow: /

User-agent: Amazonbot
Allow: /

User-agent: meta-externalagent
Allow: /

User-agent: *
Disallow: /admin/
Disallow: /private/

Allow: / is explicit, not required

If your robots.txt has no Disallow: / rule for a crawler, access is implicitly allowed. Explicit Allow: / rules are recommended because they make your intent unambiguous and are easier to audit.

Crawler classification

The Analyzer classifies each of the ten crawlers into one of three states based on the parsed robots.txt rules:

ClassificationDefinition
allowedAn explicit Allow: / rule exists for this crawler, or a Disallow: / rule exists but is overridden by Allow: / in the same block.
blockedA Disallow: / rule exists for this crawler with no overriding Allow: / rule, or the wildcard block has Disallow: / and no exact block exists for this crawler.
unspecifiedNo block exists for this crawler and the wildcard block (if present) does not have a Disallow: / rule. Access is implicitly permitted but not explicitly stated.

Precedence rules: an exact User-agent block takes priority over the wildcard User-agent: * block. Within a block, Allow rules can override Disallow rules.

Analyzer checks

The checkCrawlerRules function fetches {origin}/robots.txt, parses it into blocks, and classifies each of the ten AI crawlers:

ConditionStatusRecommendation code
robots.txt present, no global block, majority of crawlers not blockedpass—
robots.txt returns HTTP 404warnADD_ROBOTS_TXT
Majority (≥6 of 10) of AI crawlers blockedwarnREVIEW_AI_CRAWLER_ACCESS
Global Disallow: / in wildcard block, OR all 10 crawlers explicitly blockedfailALLOW_AI_CRAWLERS
Network error fetching robots.txtunknown—

A global block is defined as a User-agent: * block with Disallow: / and no overriding Allow: /. This blocks all crawlers — including AI search engines — regardless of any other rules in the file.

Common mistakes

  • Global Disallow: / with no AI-specific Allow rules. A User-agent: * block with Disallow: / blocks every crawler that does not have its own explicit block. If you need to restrict general crawlers but allow AI engines, add individual User-agent blocks for each AI crawler above the wildcard block.
  • Blocking crawlers by provider instead of user-agent. Some sites block entire IP ranges or use firewall rules to block crawlers. The Analyzer only reads robots.txt — network-level blocks are not visible to it, but they still prevent AI engines from accessing your content.
  • Missing robots.txt entirely. Without a robots.txt file, AI crawlers have no explicit guidance. The Analyzer returns warn and recommends creating the file. While implicit access is permitted, an explicit file signals that you have considered AI crawler access.
  • Blocking training crawlers but not search crawlers. CCBot and Bytespider are primarily used for training data collection. Blocking them is a reasonable choice, but blocking GPTBot, ClaudeBot, or PerplexityBot also prevents those engines from indexing your site for search results. Distinguish between training and search crawlers when writing your rules.
  • Incorrect user-agent casing in the file. While the Analyzer normalizes casing, some robots.txt parsers are case-sensitive. Use the exact casing shown in the table above (GPTBot, not gptbot) to ensure compatibility across all parsers.
PreviousAI MetadataNextStructured Signals

On this page

  • AI crawlers
  • robots.txt format
  • Crawler classification
  • Analyzer checks
  • Common mistakes
GEO AI

AI Search Optimization

AnalyzerCLIDocumentationSpecificationContact

© 2026 GEO AI · Open Source · GPL-2.0 License