Netflix's 'Architecture by Measurement'

Netflix's backend is now processing over 2 trillion events daily, using the data to drive its architectural decisions. The company's "measure, then design" approach uses this massive firehose of observability data to proactively detect scaling issues and validate new infrastructure choices against real-world traffic. This data-driven culture is key to how they automate scaling, deployment, and remediation.

This data-driven approach is a significant evolution from Netflix's earlier architecture. Around 2015, the company shifted from a batch-processing system using Chukwa and Hadoop, which had latencies of up to 10 minutes, to a real-time, streaming-first architecture to keep up with its rapidly growing global user base and the explosion of microservices. This foundational shift was necessary to handle the jump from 45 billion daily events in 2011 to over 500 billion by 2015. At the core of this real-time data infrastructure are two distinct, homegrown platforms: Keystone and Mantis. The Keystone platform is the primary data backbone, built for analytics and business intelligence, processing petabytes of data daily for tasks like user behavior analysis. Mantis, on the other hand, is designed for operational use cases, providing real-time insights into the health of microservices to enable functions like anomaly detection and alerting. This dual-platform strategy, separating analytical and operational concerns, allows engineers to use the right tool for the job. Keystone, powered by Apache Flink and Kafka, enables complex data processing and routing to various sinks like Amazon S3 and Elasticsearch. Mantis provides a reactive, stream-processing-as-a-service model that allows developers to tap into event streams on-demand without the high cost of constant, full-volume data processing. The decisions born from this firehose of data are concrete and impactful. For instance, the migration to Java 17 was justified by direct measurement of a 20% CPU performance improvement. When building their 8-billion-node Real-Time Distributed Graph, observability data led them to choose Cassandra over a dedicated graph database, a decision validated by performance at 2 million reads and 6 million writes per second. This investment in a bespoke observability stack, which costs less than 5% of total infrastructure spend, is considered one of Netflix's highest-leverage architectural investments. The platform processes 1.5 petabytes of log data and 700 billion distributed traces daily. This capability allows for proactive issue detection and has demonstrably improved operational complexity and transaction approval rates in areas like payment processing. The "measure everything" philosophy has also led to the democratization of data analysis within Netflix. By creating a Data Mesh platform with Streaming SQL that wraps the complexity of Apache Flink, Netflix enabled over 1,200 SQL-based processors to be created by non-infrastructure teams in a single year, processing 100 million events per second across thousands of pipelines. This in-house development approach has been a strategic choice. Internal analysis showed that commercial log analytics vendors could not match the cost-efficiency of their custom-built streaming log analytics system. The culture of building tools to solve specific, measured problems has produced a suite of influential open-source projects, including Spinnaker for continuous delivery and Hystrix for fault tolerance, which were born from the need to manage and observe their complex microservices landscape. The entire architecture, from the API gateway down to data storage, is a product of this incremental, data-validated evolution. Decisions are not made in a vacuum; they are responses to measured needs, from A/B testing product features at 450,000 requests per second to optimizing video encoding workflows based on observed costs.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.