Skip to main content

Sitemap Optimization

8.1 The New Role of Sitemaps in the AI Era

Traditionally, Sitemaps served as a “page directory” for Google and Bing. In the age of AI agents, the Sitemap’s role has expanded:
  • Traditional search engines: Discover pages, build an index, return search results
  • AI agents: Discover pages, extract structured data, make product recommendations
Both AI agent crawlers and traditional search engine crawlers rely on Sitemaps. A well-optimized Sitemap enables all of your products to be discovered and understood by AI agents more quickly.

8.2 Basic Sitemap Format

The standard XML Sitemap format per the sitemaps.org specification:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/product/shoe-001</loc>
    <lastmod>2026-04-10</lastmod>
    <changefreq>weekly</changefreq>
    <priority>0.8</priority>
  </url>
  <url>
    <loc>https://example.com/product/shoe-002</loc>
    <lastmod>2026-04-08</lastmod>
    <changefreq>weekly</changefreq>
    <priority>0.8</priority>
  </url>
</urlset>

8.3 Sitemap Index: Sharding Strategy

When your site has a large number of pages, use a Sitemap Index to manage multiple Sitemap shards:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://example.com/sitemap-pages.xml</loc>
    <lastmod>2026-04-10</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://example.com/sitemap-products-1.xml</loc>
    <lastmod>2026-04-10</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://example.com/sitemap-products-2.xml</loc>
    <lastmod>2026-04-09</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://example.com/sitemap-categories.xml</loc>
    <lastmod>2026-04-05</lastmod>
  </sitemap>
</sitemapindex>
Sharding rules:
  • Each Sitemap file can contain a maximum of 50,000 URLs or 50 MB
  • Shard by content type: products, categories, blog posts, and policy pages each get their own Sitemap
  • When product count exceeds 50,000, further shard alphabetically or by category

8.4 E-Commerce Sitemap Best Practices

Product Page Sitemap

The product page Sitemap is the most important one for e-commerce sites. Key optimization points:
  1. lastmod must be accurate: Update this field whenever product price, inventory, or description changes. AI crawlers use this to determine whether re-crawling is needed
  2. Tiered priority values:
    • Popular / new products: 0.9
    • Standard in-stock products: 0.7-0.8
    • Delisted products with retained pages: 0.3
  3. Only include accessible URLs: Do not include 404 pages or redirect targets in the Sitemap
  4. Canonicalize URLs: Each product should have exactly one canonical URL (avoid duplicate URLs with query parameters)

Category Page Sitemap

Category pages help AI agents understand your product taxonomy:
<url>
  <loc>https://example.com/category/running-shoes</loc>
  <lastmod>2026-04-10</lastmod>
  <changefreq>daily</changefreq>
  <priority>0.7</priority>
</url>

Policy Pages

Privacy policies, return policies, and similar pages should also be included in the Sitemap:
<url>
  <loc>https://example.com/privacy-policy</loc>
  <lastmod>2026-03-01</lastmod>
  <changefreq>monthly</changefreq>
  <priority>0.3</priority>
</url>

8.5 Dynamic Sitemap Generation

For e-commerce sites with frequent product changes, manual Sitemap maintenance is impractical. Dynamic generation is recommended.

Principles

  1. Reflect database state in real time: Automatically update the Sitemap when products are listed, delisted, or repriced
  2. Incremental lastmod updates: Only update entries that actually changed
  3. Cache with periodic refresh: Sitemaps can be cached (e.g., 4 hours), but proactively refresh when products change

Platform Solutions

PlatformApproach
ShopifyAuto-generated; no manual maintenance needed
WooCommerceAuto-generated via Yoast SEO or RankMath
Next.jsUse the next-sitemap package
Self-hostedDynamically generate XML from the database

8.6 Notifying Search Engines and AI Agents of Updates

After creating or updating a Sitemap, proactively notify consumers.

robots.txt Declaration

Add the following to the bottom of robots.txt:
Sitemap: https://yourdomain.com/sitemap.xml

Ping Search Engines

# Google
curl "https://www.google.com/ping?sitemap=https://yourdomain.com/sitemap.xml"

# Bing (IndexNow is more efficient)
curl -X POST "https://api.indexnow.org/IndexNow" \
  -H "Content-Type: application/json" \
  -d '{"host":"yourdomain.com","key":"your-indexnow-key","urlList":["https://yourdomain.com/new-product"]}'

IndexNow Protocol

IndexNow lets you notify search engines and AI crawlers in real time that your site has updates. This is far more efficient than waiting for crawlers to visit on their own schedule. Bing, Yandex, Seznam, and others already support IndexNow. Setup is straightforward:
  1. Generate an API key
  2. Place the key verification file in the root directory
  3. POST to the IndexNow API whenever a page is updated

8.7 Common Mistakes

MistakeConsequenceFix
lastmod never changesCrawlers cannot tell which pages have been updatedBind to actual update timestamps
Including 404 or redirect URLsWastes crawler resources; reduces Sitemap credibilityPeriodically clean invalid URLs
URLs contain session parametersSame page appears multiple timesUse canonical URLs
File exceeds 50 MBCrawlers may truncateShard the Sitemap
Not declared in robots.txtCrawlers may not find the SitemapAdd a Sitemap: line

8.8 Validation Tools

  1. Google Search Console — Sitemaps section: submit your Sitemap URL
  2. XML Sitemap Validator — Validate format
  3. Manual check: Visit yourdomain.com/sitemap.xml and confirm the format is correct and URLs are accessible

8.9 Self-Check Checklist

  • Sitemap exists and includes all in-stock product pages
  • lastmod reflects actual update times
  • No 404 or redirect URLs are included
  • Sitemap location is declared in robots.txt
  • Submitted to Google Search Console
  • Sharded if product count exceeds 50,000

Next chapter: Policy Quality — How privacy and return policies affect AI trust assessments