System‑design drills: chat history & collapsing

New breakdowns of interview‑style problems cover designing chat history for 10k+ messages with reverse infinite scroll, cursor pagination, and local caching for sub‑200ms loads, plus request‑collapsing patterns to prevent DB overload during 1M‑user flash sales using proxy locking and fan‑out. Both posts are framed as SDE2‑level system design practice for Big Tech interviews. (AI Chat History system design thread, Request Collapsing breakdown)

Fastly’s documentation names the collapsed request the “champion” and shows how a single origin fetch can satisfy many simultaneous edge requests to prevent origin overload. (fastly.com/documentation/guides/concepts/cache/request-collapsing/) Vercel’s CDN explicitly collapses concurrent invocations for uncached paths so the platform invokes the origin once and serves the stored response to later simultaneous requests. (vercel.com/docs/incremental-static-regeneration/request-collapsing) Nginx’s proxy_cache_lock is a widely‑used coalescing mechanism but community tests have measured added tail latency in some configurations of roughly 500ms when contention and locking are misconfigured. (nejc.blog/nginx-proxy-cache-lock-request-coalescing/) Cloudflare published a lock‑free, probabilistic revalidation approach (“roll the die” sampling) as an alternative to strict cache locks to reduce origin pressure while keeping latency predictable under high load. (blog.cloudflare.com/sometimes-i-cache/) Keyset/cursor pagination benchmarks show orders‑of‑magnitude gains versus OFFSET/LIMIT for deep history scans — one benchmark reported jumping to page 1000 in about 45ms and other tests reporting 17×–177× speedups depending on data and index strategy. (0x.run/pagination-offset-vs-cursor; milanjovanovic.tech/blog/understanding-cursor-pagination-and-why-its-so-fast-deep-dive) Browser‑side persistent stores for local caching vary: IndexedDB is the standard persistent API across modern browsers, and OPFS‑backed storage (used by projects like RxDB) can deliver multi‑times improvements over classic IndexedDB for large local datasets. (developer.mozilla.org/docs/Web/API/IndexedDB_API/Using_IndexedDB; rxdb.info/slow-indexeddb.html) Redis is designed for sub‑millisecond average responses in optimal conditions, but production p99 or spike behavior can reach tens or hundreds of milliseconds without tuning, so cache instrumentation and SLOWLOG tracing are recommended when aiming for strict sub‑200ms service‑level latency. (redis.io/docs/latest/operate/oss_and_stack/management/optimization/latency/; about.gitlab.com/blog/how-we-diagnosed-and-resolved-redis-latency-spikes/)

System‑design drills: chat history & collapsing

Get your own daily briefing