Matei Zaharia wins ACM prize
Databricks co‑founder Matei Zaharia received the ACM Prize in Computing for foundational work across Apache Spark, Delta Lake and MLflow, highlighting the market value of data-and-ML systems engineering. The award underlines why platform-level data engineering remains a high‑signal area for resumes and projects. ((techcrunch.com), PR Newswire)
A lot of modern artificial intelligence still runs on plumbing built more than a decade ago, and on April 8 the Association for Computing Machinery gave one of computer science’s biggest early-career awards to the engineer behind a big chunk of that plumbing: Matei Zaharia. The prize is the 2025 Association for Computing Machinery Prize in Computing, announced in 2026, and it comes with $250,000. (acm.org) Zaharia is best known for starting Apache Spark in 2009 during his doctoral work at the University of California, Berkeley. Spark is the software many companies use to split huge data jobs across many machines instead of trying to force one computer to do everything alone. (acm.org, spark.apache.org) That sounds abstract until you picture a spreadsheet so large it no longer fits on one laptop. Spark turns that one impossible file into many smaller tasks that can run across a cluster, then stitches the results back together fast enough for analytics, data engineering, and machine learning work. (spark.apache.org) The first problem Spark helped solve was speed. The Association for Computing Machinery said earlier distributed data systems were too slow and too awkward for newer jobs like interactive analysis and machine learning, which is why Spark spread so quickly inside companies building data teams in the 2010s. (acm.org) Then came a second problem: data lakes were cheap, but they were messy. Delta Lake added a transaction log on top of those raw file stores so teams could update tables more safely, keep metadata at large scale, and combine streaming data with batch data in one system. (docs.delta.io, databricks.com) Then came the machine learning version of the same mess. MLflow gave teams a shared system to track experiments, store models, compare runs, and deploy what worked, instead of leaving results scattered across notebooks, folders, and one-off scripts. (mlflow.org, mlflow.github.io) That is why this award is not just about one fast engine from Berkeley. The Association for Computing Machinery singled out Spark, Delta Lake, and MLflow together because they cover three different layers of the same stack: processing data, keeping data reliable, and managing models after training. (acm.org) Zaharia now sits in two places at once: he is a co-founder and chief technology officer of Databricks, and he is also an associate professor at the University of California, Berkeley. That mix helps explain why his projects moved from research papers into tools used by companies running production systems. (people.eecs.berkeley.edu, databricks.com) The timing says something about where computing is in 2026. While the public story of artificial intelligence is dominated by chatbots and giant models, this prize went to infrastructure that makes those systems usable at scale once the demo has to touch real data, real pipelines, and real teams. (acm.org, techcrunch.com) In other words, the glamour is in the model, but the money is still in the system around it. The Association for Computing Machinery honored the person who helped build that system, and that is why a prize announced on April 8, 2026 landed far beyond academia. (acm.org, techcrunch.com)