Search Console API: pulling >50k rows/day without hitting limits
D
DevOpsDana Gold
Asked 3h ago · Tools, APIs & Automation
Building a daily pipeline off the Search Analytics API and I keep hitting the 50,000-row response cap and the occasional 429. Sites have 200k+ query+page rows/day.
What's the right architecture in 2026 — pagination, batching, or something I'm missing entirely?
Recommended answer
T
TechnicalTina Platinum
Tools, APIs & Automation expert
Two layers. If you just need the data, stop fighting the API — turn on the Bulk data export to BigQuery (Settings → Bulk data export). It's raw, unsampled, daily, with no row caps. That's the right answer for >50k rows/day.
If you need the API (ad-hoc or pre-export history): the 50k cap is per request, not per day. Paginate with startRow in 25,000-row steps, batch one request per date so result sets stay small and resumable, respect the 1,200 QPM / site quota, and add exponential backoff on 429/503. Land everything in BigQuery or Postgres — Sheets dies at this volume.
Marked helpful by DevOpsDana
187 replies
C
crawl_carlPlatinumHelpful· 3h ago
The 50k cap is per-request, not per-day. Paginate with startRow in steps of 25,000 and loop until you get fewer rows than the limit.
S
schema_steveGoldHelpful· 3h ago
Batch by day — one request per date keeps each result set small and lets you resume cleanly if a job dies.
S
search_samDiamondHelpful· 2h ago
If you need everything, use the bulk export to BigQuery. Raw, unsampled, daily, no caps. The API is the wrong tool for full extracts.
N
newbie_nateBronze· 2h ago
Wait — there's a native BigQuery export now? Since when?
T
TechnicalTinaPlatinum· 2h ago
@newbie_nate yes, Settings → Bulk data export. It backfills nothing, so set it up before you need the history.
D
DevOpsDanaGoldHelpful· 2h ago
Quota is 1,200 QPM / 30,000 QPD per site. Add backoff on 429s or you'll get throttled mid-run.
C
crawl_carlPlatinum· 1d ago
Minimal backoff pattern we use:
for attempt in range(6):
try:
return svc.searchanalytics().query(...).execute()
except HttpError as e:
if e.resp.status in (429, 503):
time.sleep(2 ** attempt) # 1,2,4,8,16,32s
else:
raise
S
serp_saraSilver· 1d ago
Set dataState='all' to include fresh (partial) data, or 'final' for stable numbers — just don't mix them in one job.
S
schema_steveGoldHelpful· 1d ago
Watch sampling: summing API rows undercounts total clicks vs the UI. The BigQuery export avoids that entirely.
D
DevOpsDanaGold· 1d ago
Dimensions drive cardinality: query+page+date explodes row counts fast. Pull only what you'll actually use.
S
search_samDiamondHelpful· 23h ago
For >50k/day you usually want page+date first, then join to a narrower query-level pull. Don't request the full cross-product.
N
newbie_nateBronze· 20h ago
Storing it where? Sheets falls over at this volume.
T
TechnicalTinaPlatinum· 18h ago
@newbie_nate BigQuery or Postgres. We land raw daily partitions in BigQuery and build views on top; Sheets is just the final dashboard.
C
crawl_carlPlatinum· 14h ago
Dedupe on (date, page, query) — retried jobs create overlaps. Upsert, don't blind-append.
D
DevOpsDanaGold· 10h ago
Exactly what I needed — switching to the BigQuery bulk export and keeping the API only for ad-hoc pulls. Thanks everyone.
Sign in to read the full discussion
You're viewing the public preview. 172 more replies — including code samples, benchmarks and follow-ups — are visible to members. Joining is free.