robots.txt Optimization
5.1 robots.txt Basics
robots.txt is a plain-text file placed at your website’s root (yourdomain.com/robots.txt) that tells crawlers which pages they may or may not access.
All well-behaved crawlers — including AI agent crawlers — read this file before crawling your site.
5.2 Traditional Crawlers vs AI Crawlers
Between 2024 and 2026, a large number of new AI crawlers have emerged. They use different User-Agent strings from traditional search engine crawlers:| Crawler | Operator | User-Agent | Purpose |
|---|---|---|---|
| Googlebot | Googlebot | Traditional search indexing | |
| Bingbot | Microsoft | bingbot | Traditional search indexing |
| ChatGPT-User | OpenAI | ChatGPT-User | ChatGPT real-time browsing |
| GPTBot | OpenAI | GPTBot | AI training and search |
| Claude-Web | Anthropic | Claude-Web | Claude real-time browsing |
| ClaudeBot | Anthropic | ClaudeBot | AI training |
| PerplexityBot | Perplexity | PerplexityBot | AI search engine |
| Applebot-Extended | Apple | Applebot-Extended | Apple Intelligence |
| Google-Extended | Google-Extended | Gemini AI training | |
| cohere-ai | Cohere | cohere-ai | AI training |
5.3 Recommended Configuration
For e-commerce sites that want to maximize AI visibility:5.4 AI Training vs AI Browsing: An Important Distinction
| Type | Representative Crawlers | Purpose | Consequence of Blocking |
|---|---|---|---|
| AI Browsing | ChatGPT-User, Claude-Web | Real-time page fetching when users ask questions | AI agents cannot see your latest content |
| AI Training | GPTBot, Google-Extended | Crawling content to train AI models | AI knowledge base will not include your information |
5.5 robots.txt Management by Platform
Shopify
Shopify controls robots.txt through the theme filerobots.txt.liquid:
- Online Store → Themes → Edit code
- Find
robots.txt.liquid - Add the AI crawler rules you need
WordPress / WooCommerce
WordPress auto-generatesrobots.txt. Customize via:
- Yoast SEO: SEO → Tools → File editor
- RankMath: General Settings → Edit .htaccess and robots.txt
- Manual: Create a physical
robots.txtfile in the WordPress root directory (overrides the auto-generated version)
Self-Hosted Sites
Simply create or edit therobots.txt file in your website’s root directory.
5.6 Common Mistakes
| Mistake | Consequence | Fix |
|---|---|---|
| No robots.txt at all | All crawlers allowed by default (acceptable but unprofessional) | Create one |
Disallow: / blocks everything | AI agents cannot see any of your pages | Block only admin pages |
| ChatGPT-User/Claude-Web blocked | AI agents cannot fetch real-time content when recommending you | Remove those rules |
| No Sitemap declaration | Crawlers may miss pages | Add a Sitemap: line |
| Syntax errors in robots.txt | Rules may not take effect | Validate with Google’s robots.txt testing tool |
5.7 Verification
- Visit
yourdomain.com/robots.txtand confirm the file exists with correct formatting - Use Google Robots Testing Tool to validate rules
- Confirm that AI crawler User-Agents do not appear under any
Disallowrules
Next chapter: Writing llms.txt — Your company brief for AI agents