Databricks Unveils Lakebase for AI
Data and AI platform Databricks has introduced Lakebase, a PostgreSQL database optimized for AI and analytics workloads. The product aims to combine the reliability of PostgreSQL with the scalability of modern data lakehouses. The move reaffirms the enterprise value of open-source compatibility for new infrastructure products targeting data scientists and ML engineers.
- The core architectural principle of Lakebase is the separation of compute and storage, allowing each to scale independently and enabling features like "scale to zero" to reduce costs. This is intended to prevent a single heavy query from impacting all live operations, a common bottleneck in traditional database designs. - The technology was assembled through strategic acquisitions, including the serverless PostgreSQL company Neon and a company named Mooncake, which enhanced the integration between PostgreSQL and the data lakehouse. - A key developer-focused feature is "instant database branching," which creates zero-copy clones of a database. This allows engineering teams to test and develop on production data without affecting the live environment, mirroring the "branching" workflow common in version control systems like Git. - Databricks was founded in 2013 by the seven creators of the open-source Apache Spark project, who were researchers at UC Berkeley. The company was formed after Ben Horowitz of venture capital firm Andreessen Horowitz invested $14 million and encouraged the team to build a business around their technology, which had initially struggled to find corporate adopters. - Lakebase is entering a competitive market, with rivals like Microsoft Azure also launching PostgreSQL-compatible services for AI workloads, such as HorizonDB. Databricks aims to differentiate itself by deeply integrating Lakebase with its existing lakehouse, eliminating the need for separate "reverse ETL" pipelines to move data from analytical systems back to operational ones. - The product supports the latest Postgres 17 and includes the `pgvector` extension, which is critical for building AI-driven search and retrieval applications. Highlighted use cases include real-time feature serving for machine learning models and providing persistent memory for AI agents. - Since its launch into public preview in June 2025, Databricks reports that Lakebase adoption has grown at more than twice the rate of its data warehousing product, with thousands of companies running production workloads on it. The service is now generally available on AWS and in public preview on Azure. - Co-founder and CEO Ali Ghodsi began coding at age eight on a Commodore 64 after his family fled Iran for Sweden. The founding team's work on Apache Spark originated at UC Berkeley's AMPLab, with the project being partly inspired by the challenges seen in the historic $1 million Netflix Prize competition to improve movie recommendations.