searchengine.forum

Search Console API: pulling >50k rows/day without hitting limits

D
DevOpsDana Gold
Asked 3h ago · Tools, APIs & Automation

Building a daily pipeline off the Search Analytics API and I keep hitting the 50,000-row response cap and the occasional 429. Sites have 200k+ query+page rows/day.

What's the right architecture in 2026 — pagination, batching, or something I'm missing entirely?

Recommended answer
T
TechnicalTina Platinum
Tools, APIs & Automation expert

Two layers. If you just need the data, stop fighting the API — turn on the Bulk data export to BigQuery (Settings → Bulk data export). It's raw, unsampled, daily, with no row caps. That's the right answer for >50k rows/day.

If you need the API (ad-hoc or pre-export history): the 50k cap is per request, not per day. Paginate with startRow in 25,000-row steps, batch one request per date so result sets stay small and resumable, respect the 1,200 QPM / site quota, and add exponential backoff on 429/503. Land everything in BigQuery or Postgres — Sheets dies at this volume.

Marked helpful by DevOpsDana
187 replies
C
crawl_carlPlatinumHelpful· 3h ago

The 50k cap is per-request, not per-day. Paginate with startRow in steps of 25,000 and loop until you get fewer rows than the limit.

S
schema_steveGoldHelpful· 3h ago

Batch by day — one request per date keeps each result set small and lets you resume cleanly if a job dies.

S
search_samDiamondHelpful· 2h ago

If you need everything, use the bulk export to BigQuery. Raw, unsampled, daily, no caps. The API is the wrong tool for full extracts.

N
newbie_nateBronze· 2h ago

Wait — there's a native BigQuery export now? Since when?

T
TechnicalTinaPlatinum· 2h ago

@newbie_nate yes, Settings → Bulk data export. It backfills nothing, so set it up before you need the history.

D
DevOpsDanaGoldHelpful· 2h ago

Quota is 1,200 QPM / 30,000 QPD per site. Add backoff on 429s or you'll get throttled mid-run.

C
crawl_carlPlatinum· 1d ago

Minimal backoff pattern we use:

for attempt in range(6):
    try:
        return svc.searchanalytics().query(...).execute()
    except HttpError as e:
        if e.resp.status in (429, 503):
            time.sleep(2 ** attempt)   # 1,2,4,8,16,32s
        else:
            raise
S
serp_saraSilver· 1d ago

Set dataState='all' to include fresh (partial) data, or 'final' for stable numbers — just don't mix them in one job.

S
schema_steveGoldHelpful· 1d ago

Watch sampling: summing API rows undercounts total clicks vs the UI. The BigQuery export avoids that entirely.

D
DevOpsDanaGold· 1d ago

Dimensions drive cardinality: query+page+date explodes row counts fast. Pull only what you'll actually use.

S
search_samDiamondHelpful· 23h ago

For >50k/day you usually want page+date first, then join to a narrower query-level pull. Don't request the full cross-product.

N
newbie_nateBronze· 20h ago

Storing it where? Sheets falls over at this volume.

T
TechnicalTinaPlatinum· 18h ago

@newbie_nate BigQuery or Postgres. We land raw daily partitions in BigQuery and build views on top; Sheets is just the final dashboard.

C
crawl_carlPlatinum· 14h ago

Dedupe on (date, page, query) — retried jobs create overlaps. Upsert, don't blind-append.

D
DevOpsDanaGold· 10h ago

Exactly what I needed — switching to the BigQuery bulk export and keeping the API only for ad-hoc pulls. Thanks everyone.

Sign in to read the full discussion

You're viewing the public preview. 172 more replies — including code samples, benchmarks and follow-ups — are visible to members. Joining is free.

Free account · no spam · sign in to reply and vote