Uber's MLOps Platform 'Michelangelo' Is Now Open Source
Uber just open-sourced its entire internal MLOps platform, Michelangelo. The system is known for its battle-tested, enterprise-grade tooling for managing ML pipelines, from feature stores to model deployment. This move could reshape the MLOps landscape by giving startups and solo developers access to the same scalable infrastructure that powers Uber's global operations.
Uber initiated the development of Michelangelo back in mid-2015 to solve internal challenges. Prior to the platform, data scientists used a wide array of tools, but there was no standardized or reliable path to deploy models into production, often requiring bespoke engineering work for each project. The platform operates at a significant scale, currently managing around 400 active ML projects and over 5,000 production models. At its peak, Michelangelo serves 10 million real-time predictions per second, powering critical business decisions for services like Uber Eats and rider-driver matching. A core component is the feature store, named Palette, which hosts over 20,000 features that teams across Uber can share and reuse. The system is designed for the end-to-end ML workflow, from managing data pipelines and training models to deploying them in offline, online, or library-based environments. Michelangelo has evolved through distinct phases, initially focusing on predictive machine learning with tools like XGBoost and TensorFlow. From 2019 to 2023, it shifted to better support deep learning, integrating frameworks like PyTorch, and has recently expanded again to include capabilities for generative AI and LLMOps. This isn't Uber's first major open-source contribution. The company has a history of releasing impactful developer tools, including the distributed tracing system Jaeger, which was contributed to the Cloud Native Computing Foundation (CNCF), and the distributed deep learning framework Horovod. The platform itself is built on a combination of in-house components and popular open-source systems. Its architecture leverages technologies like Apache Spark for large-scale training, Cassandra for online serving, and Samza for stream processing. The open-source roadmap for Michelangelo aims to provide the broader community with access to the infrastructure that powers both predictive and generative AI at Uber. This includes components for the full ML lifecycle, from feature preparation and model training to production monitoring.