Databricks Launches Lakebase for AI Workloads
Databricks has introduced Lakebase, a PostgreSQL-compatible database designed for AI and analytics workloads on its open lakehouse architecture. Lakebase enables teams to use standard PostgreSQL syntax to query data stored in low-cost cloud object storage. The system is optimized for high-performance AI workloads and aims to unify transactional and analytical data under a single governance model.
- The architecture of Lakebase separates compute from storage, allowing each to scale independently to avoid resource contention that can slow down live operations. This serverless design enables compute resources to scale down to zero, meaning costs are primarily for data storage when the database is not in use. - Lakebase is built on technology from Databricks' acquisitions of Neon, a PostgreSQL company, and Mooncake, which improved PostgreSQL integration with the lakehouse. It supports PostgreSQL 17 and includes the pgvector extension for AI-driven search applications. - A key developer-focused feature is "database branching," which uses a copy-on-write mechanism to create instantaneous, zero-copy clones of a database. This allows for isolated environments for testing and development on production data without impacting the primary branch. - To address latency issues typically associated with object storage, Lakebase incorporates a middle caching layer. This is designed to support low-latency queries (under 10ms) and high-concurrency transactions. - Governance is integrated through Unity Catalog, Databricks' unified governance layer, but it also supports standard PostgreSQL roles for users who prefer that interface. This allows for consistent access policies across both transactional and analytical data. - The introduction of Lakebase is part of a broader platform expansion by Databricks, which also includes Lakeflow for data ingestion and orchestration, and AI/BI Genie, a natural language interface for business users. This strategy aims to create a comprehensive Data Intelligence Platform. - While offering standard PostgreSQL compatibility, Lakebase has some limitations; for instance, it does not allow access to the host operating system or superuser privileges, using a `databricks_superuser` role instead. Additionally, some parameters can only be configured at the session, database, or role level, not at the instance level. - This move follows a series of strategic acquisitions by Databricks, including the natural language data science notebook Einblick in January 2024 and the Apache Iceberg-focused company Tabular in June 2024 for a deal valued between $1 billion and $2 billion.