This post was sponsored by Alli AI. The opinions expressed in this article are the sponsor’s own.

Everyone assumes Googlebot is the dominant crawler hitting their website. That assumption is now wrong.

We analyzed 24,411,048 proxy requests across 78,000+ pages on 69 customer websites on Alli AI’s crawler enablement platform over a 55-day period (January to March 2026). OpenAI’s ChatGPT-User crawler made 3.6x more requests than Googlebot across our data sample. And that’s not even counting GPTBot, OpenAI’s separate training crawler.

A note on methodology: Crawler identification used user agent string matching, verified against published IP ranges. Request metrics are measured at the proxy/CDN layer. The dataset covers 69 websites across a variety of industries and sizes, predominantly WordPress-based. Full methodology is detailed at the end.

Finding 1: AI Crawlers Now Outpace Google 3.6x & ChatGPT Leads the Pack

When we ranked every identified crawler by request volume, the results were unambiguous:

Rank	Crawler	Requests	Category
1	ChatGPT-User (OpenAI)	133,361	AI Search
2	Googlebot	37,426	Traditional Search
3	Amazonbot	35,728	AI / E-Commerce
4	Bingbot	18,280	Traditional Search
5	ClaudeBot (Anthropic)	13,918	AI Search
6	MetaBot	10,756	Social
7	GPTBot (OpenAI)	8,864	AI Training
8	Applebot	6,794	AI Search
9	Bytespider (ByteDance)	6,644	AI Training
10	PerplexityBot	5,731	AI Search

ChatGPT-User made more requests than Googlebot, Amazonbot, and Bingbot combined.

Grouped by purpose, AI-related crawlers (ChatGPT-User, GPTBot, ClaudeBot, Amazonbot, Applebot, Bytespider, PerplexityBot, CCBot) made 213,477 requests versus 59,353 for traditional search crawlers (Googlebot, Bingbot, YandexBot). AI crawlers are now making 3.6x more requests than traditional search crawlers across our network.

Finding 2: OpenAI Uses 2 Crawlers (And Most Sites Don’t Know the Difference)

OpenAI operates two distinct crawlers with very different purposes.

ChatGPT-User is the retrieval crawler. It fetches pages in real time when users ask ChatGPT questions that require up-to-date web information. This determines whether your content appears in ChatGPT’s answers.

GPTBot is the training crawler. It collects data to improve OpenAI’s models. Many sites block GPTBot via robots.txt but not ChatGPT-User, or vice versa, without understanding the distinct consequences of each.

Combined, OpenAI’s crawlers made 142,225 requests: 3.8x Googlebot’s volume.

The robots.txt directives are separate:

User-agent: GPTBot      # Training crawler — feeds OpenAI's models
User-agent: ChatGPT-User # Retrieval crawler — fetches pages for ChatGPT answers

Finding 3: AI Crawlers Are Faster & More Reliable, But Their Volume Adds Up

AI crawlers are significantly more efficient per request:

Crawler	Avg Response Time	200 Success Rate
PerplexityBot	8ms	100%
ChatGPT-User	11ms	99.99%
GPTBot	12ms	99.9%
ClaudeBot	21ms	99.9%
Bingbot	42ms	98.4%
Googlebot	84ms	96.3%

Two likely reasons. First, AI retrieval crawlers are fetching specific pages in response to user queries, not exhaustively discovering site architecture. They know what they want, they grab it, and they leave. Second, while all crawlers on our infrastructure receive pre-rendered responses, Googlebot’s broader crawl pattern means it requests a wider range of URLs, including stale paths from sitemaps and its own legacy index, which adds latency from redirect chains and error handling that retrieval crawlers avoid entirely.

But there’s a catch: while each individual request is lightweight, the sheer volume means aggregate server load is substantial. ChatGPT-User at 11ms × 133,361 requests is still a real infrastructure cost, just distributed differently than Googlebot’s fewer, heavier requests.

Finding 4: Googlebot Sees a Different (Worse) Version of Your Site

Googlebot’s 96.3% success rate versus near-perfect rates for AI crawlers reveals an important structural difference.

Googlebot received 624 blocked responses (403) and 480 not found errors (404), accounting for 3% of its requests. Meanwhile, ChatGPT-User achieved 99.99% success. PerplexityBot hit a perfect 100%.

Why the gap? The most likely explanation is index age and crawl behavior, not site misconfiguration.

Googlebot maintains a massive legacy index built over years of continuous crawling. It routinely re-requests URLs it already knows about — including pages that have since been deleted (404s) or restructured (403s). This is normal behavior for a search engine maintaining an index of this scale, but it means a meaningful percentage of Googlebot’s requests are directed at URLs that no longer exist.

AI crawlers don’t carry that baggage. ChatGPT-User fetches specific pages in response to real-time user queries, targeting content that’s currently relevant and linked. That’s a structural advantage that produces near-perfect success rates.

Industry Reports Confirm AI Crawling Surged 15x in 2025

These findings align with broader industry trends. Cloudflare’s 2025 analysis reported ChatGPT-User requests surging 2,825% YoY, with AI “user action” crawling increasing more than 15x over the course of 2025. Akamai identified OpenAI as the single largest AI bot operator, accounting for 42.4% of all AI bot requests. Vercel’s analysis of nextjs.org confirmed that none of the major AI crawlers currently render JavaScript.

Our data shows this crossover may already be happening at the site level for properties that actively enable AI crawler access.

Your New SEO Strategy: How To Audit, Clean Up & Optimize For AI Crawlers

1. Audit your robots.txt for AI crawlers today

Most robots.txt files were written for a Googlebot-first world. At minimum, have explicit directives for ChatGPT-User, GPTBot, ClaudeBot, Amazonbot, PerplexityBot, Applebot, Bytespider, CCBot, and Google-Extended.

Our recommendation: Most businesses benefit from allowing both retrieval crawlers (ChatGPT-User, PerplexityBot, ClaudeBot) and training crawlers (GPTBot, CCBot, Bytespider), training data is what teaches these models about your brand, products, and expertise. Blocking training crawlers today means AI models learn less about you tomorrow, which reduces your chances of being cited in AI-generated answers down the line.

The exception: if you have content you specifically need to protect from model training (proprietary research, gated content), use granular Disallow rules for those paths rather than blanket blocks.

2. Clean up stale URLs in Google Search Console

Our data shows Googlebot hits a 3% error rate, mostly 403s and 404s, while AI crawlers achieve near-perfect success rates. That gap likely reflects Googlebot re-crawling legacy URLs that no longer exist. But those failed requests still consume the crawl budget.

Audit your GSC crawl stats for recurring 404s and 403s. Set up proper redirects for restructured URLs and submit updated sitemaps.

3. Treat AI crawler accessibility as a distinct SEO channel

Ranking in ChatGPT’s answers, Perplexity’s results, and Claude’s responses is emerging as a distinct visibility channel. If your content isn’t accessible to these crawlers, particularly if you’re running JavaScript-heavy frameworks, you’re invisible in AI search.

We’ve published a live dashboard showing how AI crawler traffic breaks down across a real site: which platforms are visiting, how often, and their share of total traffic; if you want to see what this looks like in practice.

4. Plan for volume, not just individual request weight

AI crawlers send light, fast requests, but they send many of them. ChatGPT-User alone accounted for more than 133,000 requests in 55 days. The aggregate server load from AI crawlers is now likely exceeding your Googlebot load. Make sure your hosting and CDN can handle it, the low per request response times in our data reflect the fact that Alli AI serves pre-rendered static HTML from the CDN edge, which is exactly the kind of architecture that absorbs this volume without taxing your origin server.

Methodology

This analysis is based on 24,411,048 HTTP proxy requests processed through Alli AI’s crawler enablement platform between January 14 and March 9, 2026, covering 69 customer websites.

Crawler identification used user agent string matching, verified against published IP ranges. For OpenAI crawlers specifically, every request was cross-referenced against OpenAI’s published CIDR ranges. This confirmed 100% of GPTBot requests and 99.76% of ChatGPT-User requests originated from OpenAI’s infrastructure. The remaining 0.24% (requests from spoofed user agents) were excluded.

Limitations: The dataset is scoped to Alli AI customers who have opted into crawler enablement. Crawlers that don’t self-identify via user agent are not captured. Response time measurements are at the proxy layer, not the origin server.

About Alli AI

Alli AI provides server-side rendering infrastructure for AI and search engine crawlers. This analysis was produced using data from our proxy infrastructure to help the SEO community better understand the evolving crawler landscape.

Want to see this data in action? See the breakdown firsthand by visiting our AI visibility dashboard.

Image Credits

Featured Image: Image by Alli AI. Used with permission.

In-Post Iamges: Images by Alli AI. Used with permission.

Source link

Addresse

Numéro de téléphone

Adresse email

Leave a Reply Cancel reply

Navigation

Services

Rester en contact

ChatGPT Now Crawls 3.6x More Than Googlebot: What 24M Requests Reveal