robots.txt Optimization

5.1 robots.txt Basics

robots.txt is a plain-text file placed at your website’s root (yourdomain.com/robots.txt) that tells crawlers which pages they may or may not access. All well-behaved crawlers — including AI agent crawlers — read this file before crawling your site.

5.2 Traditional Crawlers vs AI Crawlers

Between 2024 and 2026, a large number of new AI crawlers have emerged. They use different User-Agent strings from traditional search engine crawlers:

Crawler	Operator	User-Agent	Purpose
Googlebot	Google	`Googlebot`	Traditional search indexing
Bingbot	Microsoft	`bingbot`	Traditional search indexing
ChatGPT-User	OpenAI	`ChatGPT-User`	ChatGPT real-time browsing
GPTBot	OpenAI	`GPTBot`	AI training and search
Claude-Web	Anthropic	`Claude-Web`	Claude real-time browsing
ClaudeBot	Anthropic	`ClaudeBot`	AI training
PerplexityBot	Perplexity	`PerplexityBot`	AI search engine
Applebot-Extended	Apple	`Applebot-Extended`	Apple Intelligence
Google-Extended	Google	`Google-Extended`	Gemini AI training
cohere-ai	Cohere	`cohere-ai`	AI training

5.3 Recommended Configuration

For e-commerce sites that want to maximize AI visibility:

# Traditional search engines — allow all
User-agent: Googlebot
Allow: /

User-agent: bingbot
Allow: /

# AI agent browsing — allow (these agents recommend your products)
User-agent: ChatGPT-User
Allow: /

User-agent: Claude-Web
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Applebot-Extended
Allow: /

# AI training crawlers — decide based on your preference
# If you want AI models to learn about your brand (recommended):
User-agent: GPTBot
Allow: /

User-agent: Google-Extended
Allow: /

# If you do not want your content used for training:
# User-agent: GPTBot
# Disallow: /

# Universal rules for all crawlers
User-agent: *
Disallow: /admin/
Disallow: /checkout/
Disallow: /cart/
Disallow: /account/
Disallow: /api/

# Sitemap location
Sitemap: https://yourdomain.com/sitemap.xml

5.4 AI Training vs AI Browsing: An Important Distinction

Type	Representative Crawlers	Purpose	Consequence of Blocking
AI Browsing	ChatGPT-User, Claude-Web	Real-time page fetching when users ask questions	AI agents cannot see your latest content
AI Training	GPTBot, Google-Extended	Crawling content to train AI models	AI knowledge base will not include your information

Recommendation: AI browsing crawlers must be allowed (otherwise AI agents cannot see your pages when recommending you). AI training crawlers depend on your preference, but allowing training generally means AI has a better understanding of your brand and products.

5.5 robots.txt Management by Platform

Shopify

Shopify controls robots.txt through the theme file robots.txt.liquid:

Online Store → Themes → Edit code
Find robots.txt.liquid
Add the AI crawler rules you need

WordPress / WooCommerce

WordPress auto-generates robots.txt. Customize via:

Yoast SEO: SEO → Tools → File editor
RankMath: General Settings → Edit .htaccess and robots.txt
Manual: Create a physical robots.txt file in the WordPress root directory (overrides the auto-generated version)

Self-Hosted Sites

Simply create or edit the robots.txt file in your website’s root directory.

5.6 Common Mistakes

Mistake	Consequence	Fix
No robots.txt at all	All crawlers allowed by default (acceptable but unprofessional)	Create one
`Disallow: /` blocks everything	AI agents cannot see any of your pages	Block only admin pages
ChatGPT-User/Claude-Web blocked	AI agents cannot fetch real-time content when recommending you	Remove those rules
No Sitemap declaration	Crawlers may miss pages	Add a `Sitemap:` line
Syntax errors in robots.txt	Rules may not take effect	Validate with Google’s robots.txt testing tool

5.7 Verification

Visit yourdomain.com/robots.txt and confirm the file exists with correct formatting
Use Google Robots Testing Tool to validate rules
Confirm that AI crawler User-Agents do not appear under any Disallow rules

Next chapter: Writing llms.txt — Your company brief for AI agents

JSON-LD vs Microdata: format comparison llms.txt writing guide for AI discoverability

​robots.txt Optimization

​5.1 robots.txt Basics

​5.2 Traditional Crawlers vs AI Crawlers

​5.3 Recommended Configuration

​5.4 AI Training vs AI Browsing: An Important Distinction

​5.5 robots.txt Management by Platform

​Shopify

​WordPress / WooCommerce

​Self-Hosted Sites

​5.6 Common Mistakes

​5.7 Verification