Uber's Data System Latency Analyzed

A new analysis explores the "substitution paradox" in distributed systems, using Uber's SLOG data store as a case study. The SLOG system reportedly achieves both low latency and strict serializability, a combination that challenges long-held assumptions in distributed database design and impacts real-time ML serving.

- Strict serializability ensures that transactions behave as if they are executed sequentially on a single machine, which simplifies application logic and prevents bugs. However, this typically comes at the cost of higher latency for writes and reduced transactional throughput in geo-replicated systems. - Uber's SLOG (Serializable, Low-latency, Geo-replicated transactions) aims to achieve all three: strict serializability, low-latency writes, and high transactional throughput. It accomplishes this by leveraging data access locality, assigning a "home" region to each piece of data to allow for rapid local reads and writes without cross-region communication. - For transactions that need to access data from multiple regions ("multi-home" transactions), SLOG uses a deterministic architecture to move most of the communication outside of conflict boundaries, which helps maintain high throughput. - The challenge of achieving low latency is a significant concern in real-time recommendation systems, which must process millions of user interaction events per second. These systems need to handle high-velocity data streams while providing timely and relevant recommendations. - Real-time machine learning architectures often rely on event-driven models where a continuous stream of data is fed into the model for ongoing training and inference. A key component of this architecture is a feature store that holds reference data, which must have extremely low latency to be effective. - Uber's broader infrastructure has evolved from a monolithic architecture to a microservices-based one to support its global scale and real-time operational demands. This architecture is deployed across multiple geographic regions and utilizes various database technologies, including schemaless databases built on MySQL, as well as Riak and Cassandra for high-availability and low-latency access. - The "eight fallacies of distributed computing," originally outlined by L. Peter Deutsch at Sun Microsystems, are a set of incorrect assumptions that new developers often make about distributed systems. These fallacies include the beliefs that the network is reliable, latency is zero, and bandwidth is infinite, all of which are critical considerations in designing systems like SLOG.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.