70M‑Row Query Thread

A deep thread on handling a massive GET /api/v1/reports query (70M rows) walks through indexing, partitioning, and caching tradeoffs for latency and cost—practical tactics for any scale‑up backend. The discussion drills into real‑world choices for monetization and observability when full scans become untenable. (x.com)

For selective filters on wide, tens‑of‑millions‑of‑rows tables, PostgreSQL’s default B‑Tree index is the go‑to for equality and range predicates because it avoids scanning entire relations; PostgreSQL documents B‑Tree as the baseline index type for those use cases. (postgresql.org/docs/current/indexes-types.html) Partitioning by a high‑cardinality, time, or ingestion column can reduce bytes scanned from a 70M‑row table to a small subset of partitions, and BigQuery’s partitioned‑table docs recommend pruning to control both latency and query cost. (cloud.google.com/docs/bigquery/docs/partitioned-tables) Front‑end query splitting and result caching is a practical mitigation for long single‑request scans — Thanos’s Query Frontend explicitly implements split‑intervals (default 24h) and result caching to prevent OOMs and repeated full work. (github.com/thanos-io/thanos/blob/main/docs/components/query-frontend.md) Operational observability should include statement‑level telemetry; PostgreSQL’s pg_stat views (pg_stat_activity / pg_stat_statements) and Redshift’s Query Monitoring Rules let teams identify and log or abort long/IO‑heavy queries (for example, canceling queries that exceed a configured time threshold). (postgresql.org/docs/current/monitoring.html) (docs.aws.amazon.com/redshift/latest/dg/cm-c-wlm-query-monitoring-rules.html) Transforming synchronous full‑scan endpoints into asynchronous export jobs (202 Accepted + job ID, polling or webhook callbacks) is a common production pattern used by vendors and documented by Zoho, Anduin and AWS architecture guides to avoid blocking API workers while offloading heavy work to job queues. (www.zoho.com/analytics/api/v2/bulk-api/export-data-async.html) (aws.amazon.com/blogs/architecture/managing-asynchronous-workflows-with-a-rest-api/) When exports become a business decision, platform teams often gate or monetize heavy export APIs via tiered plans or per‑call pricing; industry guidance and vendors such as Stripe and API monetization platforms document tiered export access and usage‑based models that raise ARPU while limiting free full‑scan load. (stripe.com/resources/more/what-is-api-monetization-heres-how-it-works-and-why-its-so-appealing) (www.getmonetizely.com/articles/data-export-pricing-strategic-approaches-for-monetizing-customer-data-access)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.