Databricks backs Apache Iceberg v3
Databricks put Apache Iceberg v3 into public preview to speed ingestion, improve query planning and simplify large-scale lakehouse management. That push toward open table formats is already showing up in partner products—Persistent Systems launched a merchant-risk-management solution built on Databricks AI—highlighting a shift from standalone warehousing toward integrated AI data platforms. (databricks.com) (prnewswire.com)
Most data lakes still work like a warehouse where every update means moving pallets around. Apache Iceberg is the table format meant to fix that by keeping a detailed map of files so engines can find, add, or remove data without constantly rewriting everything. (iceberg.apache.org) The new part is version 3 of that map. The Apache Iceberg specification added row lineage, deletion vectors, a Variant type for mixed-structure data, default column values, new partitioning transforms, and encryption keys. (iceberg.apache.org) Deletion vectors are the easiest feature to picture. Instead of rewriting a whole data file just to mark a few rows as deleted, Iceberg v3 can keep a side list of which rows to skip, which Amazon Web Services says cuts write amplification and speeds up writes. (aws.amazon.com) Row lineage is the audit trail. Iceberg v3 can track which source rows produced a downstream row, which helps with incremental processing and with proving where a record came from after multiple updates. (iceberg.apache.org) Variant is for the messy stuff companies actually collect. Databricks says Iceberg v3 can now handle semi-structured data that previously needed brittle workarounds, so teams can query records that do not all share the same columns. (databricks.com) Databricks put those Iceberg v3 features into public preview on April 9, 2026. Its documentation says the preview works through Unity Catalog and applies to managed Iceberg tables, foreign Iceberg tables, and managed Delta Lake tables with Universal Format enabled. (databricks.com) (docs.databricks.com) That last part is the bigger strategic move. Databricks says Iceberg v3 brings the data layer of Iceberg and Delta Lake closer together, so customers can build interoperable pipelines without rewriting the underlying data. (databricks.com 1) (databricks.com 2) Databricks had already moved in this direction in 2025 by adding public-preview support for Apache Iceberg tables in Unity Catalog, including managed Iceberg tables and governance for Iceberg tables stored in outside catalogs. Iceberg v3 is the next step because it adds the newer table features on top of that open-table push. (databricks.com 1) (databricks.com 2) You can see the commercial angle in what partners are building on top. Persistent Systems said on April 9, 2026 that it launched a merchant risk management and fraud detection product on the Databricks Data Intelligence Platform for banks, acquirers, and payment service providers. (prnewswire.com) Persistent says the product uses real-time intelligence and workflows to reduce fraud losses, improve detection accuracy, and lower manual review effort. That is the kind of application that benefits when one platform can store raw payment data, govern it across tools, and feed it directly into artificial intelligence models without copying it into a separate warehouse first. (prnewswire.com) The fight here is no longer just over who stores tables fastest. It is over whether companies can keep one governed copy of data in an open format and use it for analytics, transactions, and artificial intelligence at the same time. (databricks.com) (iceberg.apache.org)