SEO · Search Engine Optimisationintermediate3 min read

What is Crawl Budget?

Crawl budget is the number of pages Googlebot will crawl on your website within a given timeframe. It's determined by two factors: crawl rate limit (how fast Google can crawl without overloading your server) and crawl demand (how much Google wants to crawl your pages based on their popularity and freshness). Managing crawl budget is critical for large sites where not all pages get crawled and indexed in a reasonable time.

500ms

server response time threshold — above this, Googlebot crawls less

Source: Google Search Central, 2024

Fact-checked against 3 sourcesLast updated 8 June 2026

Key Takeaways

Crawl budget = crawl rate limit × crawl demand — optimise both factors.
Only matters when you have hundreds of pages or more; small sites are crawled fully by default.
Blocking low-value pages (tag archives, filtered URLs) frees budget for pages that matter.
Server response time under 200ms is the single most impactful crawl budget lever.
Check your crawl stats in Google Search Console → Settings → Crawl Stats.

In this article

01What Determines Your Crawl Budget?
02Signs You Have a Crawl Budget Problem
03How to Optimise Crawl Budget

What Determines Your Crawl Budget?

Google allocates crawl budget based on two things: how fast your server can handle requests without slowing down, and how much value Google sees in your pages.

Crawl rate limit is Google's self-imposed throttle. If your server responds slowly, Googlebot backs off to avoid degrading your site's performance for real users. Faster servers = more crawling.

Crawl demand is driven by how popular your URLs are (backlinks, traffic) and how frequently they change. A homepage gets crawled daily. A deep archive page from 2012 might get crawled monthly — or never.

Signs You Have a Crawl Budget Problem

Most sites don't have a crawl budget problem. If you have fewer than a few hundred pages, Google will crawl all of them regularly.

You need to care about crawl budget if: new pages take weeks to appear in search results, you have thousands of URLs but low organic traffic, or Google Search Console shows a crawl anomaly.

The clearest signal is in Search Console's Crawl Stats report — compare daily crawl volume against your total page count. If Googlebot is visiting a fraction of your pages, you have a problem worth solving.

Stay sharp

Most guides are already outdated.

One email a week. The search stuff that actually matters — what shifted, what died, and what to do about it.

Subscribe free →

How to Optimise Crawl Budget

The fastest wins come from blocking what shouldn't be crawled. Use robots.txt to block tag archives, filtered product pages, and admin URLs. Add noindex to thin pages you can't delete.

Consolidate duplicate content with canonical tags. Fix redirect chains so Googlebot doesn't waste hops. Remove dead URLs from your sitemap — only include canonical, indexable pages.

On the server side: improve response time, reduce server errors, and ensure your most important pages are reachable within 3 internal links from the homepage.

CRAWL BUDGET FORMULA

Crawl Budget = Crawl Rate Limit × Crawl Demand

Crawl Rate Limit is the maximum speed Googlebot can crawl without overloading your server — measured in parallel connections and requests per second. Crawl Demand reflects how much Google prioritises your URLs based on popularity (backlinks, traffic) and freshness (how recently content changed). Both factors must be high to maximise the pages Googlebot processes in a given window.

✓ DO

✓

Submit an XML sitemap containing only canonical, indexable URLs

✓

Use robots.txt to block faceted navigation, session IDs, and admin directories

✓

Improve server response times (target TTFB under 200ms) to increase crawl rate limit

✓

Keep your most important pages within 3 clicks of the homepage

✓

Monitor the Crawl Stats report in Google Search Console weekly for anomalies

✗ DON'T

✗

Include redirected or noindexed URLs in your XML sitemap

✗

Block JavaScript or CSS files in robots.txt — Googlebot needs them to render pages

✗

Let redirect chains exceed two hops, wasting crawl budget on intermediate URLs

✗

Rely on crawl budget fixes alone if your content quality is the real indexing issue

✗

Ignore 5xx server errors — they signal Googlebot to back off and crawl less frequently

SITES THAT NEED CRAWL BUDGET MANAGEMENT VS. SITES THAT DON'T

Crawl Budget Is Critical	Crawl Budget Is Not a Priority
10,000+ indexable URLs	Fewer than ~1,000 pages
New pages take weeks to appear in Search Console	New pages indexed within 1–3 days
Large e-commerce with faceted navigation	Small business or blog site
News or content sites publishing dozens of articles daily	Static brochure site updated monthly
GSC Crawl Stats show only a fraction of pages crawled per day	GSC shows consistent daily crawl coverage
Multiple redirect chains and legacy URL structures	Clean, flat URL architecture

CRAWL BUDGET AUDIT CHECKLIST

0/8 complete

Check Google Search Console Crawl Stats: compare average daily crawls to total page count

Audit your XML sitemap — remove redirected, noindexed, and 4xx URLs

Identify and flatten redirect chains longer than one hop

Block low-value URL patterns in robots.txt (filters, tags, pagination, session parameters)

Add noindex to thin or duplicate pages you cannot remove entirely

Review internal link depth — ensure key pages are reachable within 3 clicks

Measure server TTFB and resolve any response time issues above 500ms

Check the Coverage report in GSC for crawled-but-not-indexed pages as a waste signal

REAL-WORLD EXAMPLE

How an E-commerce Site Recovered Indexing by Cutting 400,000 URLs

A large fashion retailer had over 600,000 URLs — the majority generated by faceted navigation (colour, size, and sort-order filter combinations). Googlebot was spending the bulk of its crawl budget on these parameter-driven pages, leaving thousands of new product and category pages unindexed for months. After blocking filter parameters in robots.txt and consolidating duplicate filtered pages with canonical tags, the crawlable URL pool dropped to under 200,000. Within eight weeks, Google Search Console showed a 3× increase in new product pages indexed, and organic traffic to new seasonal lines improved materially within the same quarter.

HOW GOOGLE'S APPROACH TO CRAWL BUDGET HAS EVOLVED

2009

Crawl-Delay Directive Introduced

Search engines including Google began honouring the crawl-delay directive in robots.txt, giving site owners a basic mechanism to throttle Googlebot and protect server performance.

2016

Google Officially Defines 'Crawl Budget'

Google's Gary Illyes published the first official blog post explaining crawl budget as the product of crawl rate limit and crawl demand, giving SEOs a formal framework for the first time.

2018

Crawl Stats Report Added to Search Console

Google launched the Crawl Stats report in the new Search Console, surfacing daily crawl data, response codes, and file-type breakdowns — making crawl budget analysis accessible without log file analysis.

2022

Google Updates Crawl Budget Documentation for JavaScript

Google clarified that JavaScript-heavy pages require a two-stage crawl-and-render process, effectively doubling the crawl budget cost of client-side rendered content versus server-side HTML.

Free Tool

How does your site score on SEO?

Paste your URL. Get a score and a fix list across all three disciplines. No form, no email.

Run Free Audit →

Frequently Asked Questions

Not directly — crawl budget affects indexation, which affects whether your pages can rank at all. If a page isn't crawled, it can't be indexed, and if it isn't indexed, it can't rank. So indirectly, yes: poor crawl budget management can suppress rankings on large sites.

Go to Google Search Console → Settings (bottom left) → Crawl Stats. You'll see average daily crawl volume, response time breakdown, and crawl requests by response code. Compare the daily crawl number against your total indexed page count.

Only block pages you're confident should never be indexed. robots.txt prevents crawling but Google may still index a blocked URL if it has external links pointing to it. For pages you want deindexed, use noindex instead. Use robots.txt for pages that genuinely waste crawl budget — parameter URLs, faceted navigation, session IDs.

Sources & Further Reading

1.Google Search Central — Crawl Budget documentation
2.Screaming Frog SEO Spider documentation
3.Advanced Web Ranking CTR Study, 2024

What is Crawl Budget?

What Determines Your Crawl Budget?

Signs You Have a Crawl Budget Problem

Most guides are already outdated.

How to Optimise Crawl Budget

How does your site score on SEO?

Frequently Asked Questions

Read next

robots.txt

XML Sitemap

Canonical Tags

Programmatic SEO