Netflix's Cross-Device Resume System
A technical blog post details the architecture behind Netflix's playback resume feature, revealing its complexity at scale. The system requires distributed state management, deduplication, and eventual consistency to provide resilient state restoration across devices with low latency for millions of concurrent users.
- The client-side application sends a "heartbeat" signal to Netflix's servers every few seconds to log the current viewing timestamp, ensuring that even if the app crashes, the progress is only a few seconds behind. - To handle concurrent viewing across multiple devices, the system employs a "last write wins" strategy, where the timestamp from the most recently active device is the one that gets saved. - The server-side architecture uses a multi-tier storage approach, writing the playback state to a fast in-memory cache for low-latency retrieval and asynchronously persisting it to a distributed database like Cassandra for long-term durability. - User state data is replicated across Netflix's global data centers, which minimizes latency when a user resumes playback, regardless of their geographical location. - The system is designed for resilience through graceful degradation; for example, if the in-memory cache fails, the system can fall back to the more persistent database layer to retrieve the last known timestamp. - Playback data, including where users pause and resume, is a key input for the machine learning models that rank and personalize the "Continue Watching" row on the Netflix homepage. - While the resume system's state management runs on AWS, the actual video streaming is handled by Netflix's proprietary content delivery network (CDN), Open Connect, which places servers directly within internet service provider (ISP) networks. - The entire feature is part of a broader microservices architecture, where the playback state service is decoupled from other services like video transcoding, licensing, and recommendations.