Naveen shows platform scaling to 130k

- Uber engineers detailed how the company’s internal platform “Up” grew from ad hoc scripts and Jenkins jobs into a Kubernetes-based deployment system. - Uber now deploys about 4,000 services 100,000 times a day, launching 1.5 million pods daily across more than 50 clusters. - The work sits inside Uber’s broader shift from Mesos and on-prem tools toward Kubernetes and AI-ready infrastructure. (uber.com)

Kubernetes is software that schedules containers, the app packages teams ship, across fleets of machines. Uber says that system now underpins the internal platform its engineers use to deploy about 4,000 services 100,000 times a day. (kubernetes.io) (uber.com) The platform is called Up, Uber’s internal cloud layer for stateless microservices. In a September 2023 engineering post, Uber said its 4,500 stateless microservices were being deployed more than 100,000 times each week by 4,000 engineers and automated systems. (uber.com) Uber traced that system back to 2014, when the company was still running a Python monolith backed by one PostgreSQL database. As teams and services multiplied, deployment methods ranged from bash scripts to Jenkins jobs before Uber built its in-house tool μDeploy to centralize releases. (uber.com) By August 2024, Uber said it had kicked off a broader continuous-deployment overhaul in 2022, when it was managing about 4,500 microservices across three monorepos, 5,600 commits a week, and 7,000 production deployments a week. Only 7% of services were deploying automatically to production at that point. (uber.com) The bigger infrastructure shift came in 2024, when Uber completed the migration of its stateless container orchestration platform from Apache Mesos to Kubernetes. Uber said Mesos had been deprecated internally in 2021, while Kubernetes offered broader cloud support, a larger open-source ecosystem, and ongoing security updates. (uber.com) Uber’s Container Platform team said it manages more than 50 compute clusters across on-prem data centers and cloud providers including Oracle Cloud and Google Cloud. Each cluster runs roughly 5,000 to 7,500 hosts, about 250,000 cores, and around 50,000 pods. (uber.com) At that scale, Uber said those services launch 1.5 million pods a day, with 120 to 130 pods a second in a single cluster. The company said the same clusters power Up, the federation layer developers use to manage service lifecycles. (uber.com) Artificial intelligence pushed the platform further. Uber said in March 2024 that machine learning had become central to pricing, matching, and other business-critical systems, and that newer deep learning and generative artificial intelligence workloads were driving demand for new CPU and graphics processing unit infrastructure. (uber.com) That led Uber to move machine learning workloads onto Kubernetes in early 2024 and to rebuild pieces of its Michelangelo platform around Ray and new in-house resource management tools. Uber said the old setup forced machine learning engineers to pick regions, zones, and graphics chip types by hand, which slowed experiments and wasted capacity. (uber.com) The through line in Uber’s engineering posts is that platform work stopped being a side tool years ago. It became the system that decides how thousands of services, millions of pods, and newer AI jobs get shipped every day. (uber.com 1) (uber.com 2)

Naveen shows platform scaling to 130k

Get your own daily briefing