Lakehouse moves practical
Databricks has put Apache Iceberg v3 into public preview to enable a single-copy lakehouse that multiple engines can read without full-table replication, aiming to reduce pipeline drift and reconciliation pain. A separate case study shows a massive geospatial/property data lakehouse can accelerate collateral intelligence—if access controls, lineage and freshness policies are enforced as first-class capabilities. (databricks.com, nallas.com/engineering-a-massive-scale-geospatial-property-data-lakehouse/)
A lakehouse is supposed to mean one shared copy of data, but many companies still keep one version for Databricks, another for Trino, another for Snowflake, and then spend nights explaining why the totals do not match. Databricks said on April 9, 2026 that Apache Iceberg version 3 is now in public preview on its platform to cut out that copying step. (databricks.com) Apache Iceberg is a table format, which is the rulebook that tells software where rows live, which files belong to a table, and what changed between versions. If that rulebook is open and widely supported, different query engines can read the same table instead of each engine demanding its own private duplicate. (databricks.com) The old pain was not storage bills alone. Every extra copy created another pipeline, another refresh schedule, and another chance for “customer count” to mean 10,241 in one dashboard and 10,198 in another because one job ran at 2 a.m. and the other failed at 2:07. (databricks.com) Iceberg version 3 adds three pieces that make the shared-copy idea more practical. Databricks highlights deletion vectors, row lineage, and the Variant data type as the key changes now exposed in preview. (databricks.com, docs.databricks.com) Deletion vectors are a way to mark which rows are gone without rewriting every file that used to contain them. That is the data equivalent of crossing out three names on a printed guest list instead of reprinting the whole list for a 10,000-person event. (databricks.com, docs.databricks.com) Row lineage is a record of where each row came from and what happened to it on the way through the pipeline. That gives teams a row-level audit trail for incremental processing, which is useful when a regulator, an analyst, or a risk team asks why one property or one loan changed status. (databricks.com, docs.databricks.com) Variant is a data type for semi-structured data such as JavaScript Object Notation payloads that do not fit neatly into fixed columns on day one. Databricks says Iceberg version 3 now brings that flexibility into the open specification so teams can keep messy event data, application logs, and nested records in the same governed table layer. (databricks.com, databricks.com) Databricks is also tying this to Unity Catalog, its governance layer, so the same Iceberg tables can be discovered, permissioned, and monitored from one place. Its documentation says Iceberg version 3 features are in public preview in Databricks Runtime 18.0 and above, while managed and foreign Iceberg tables are available in public preview starting with Databricks Runtime 16.4 long-term support. (docs.databricks.com, docs.databricks.com) The second half of the story is a case study from Nallas, which describes a financial-services lakehouse built around geospatial and property data for collateral intelligence. The system joins parcel records, ownership details, valuation signals, tax data, zoning information, flood and wildfire indicators, and map layers so a lender can inspect one property as a living asset instead of a static address line. (nallas.com) That kind of lakehouse gets fast only if governance is built into the plumbing. Nallas says the platform treated access controls, lineage, and freshness policies as core features, because a credit team looking at stale flood-risk data or unrestricted borrower data is not using “better analytics” so much as a faster way to make a bad decision. (nallas.com) Put those two pieces together and the pitch gets more concrete. Iceberg version 3 is Databricks trying to make the single-copy, many-engines model real in day-to-day operations, and the Nallas case study shows what happens when that shared layer is fed with high-value data and guarded tightly enough that people can actually trust what they are reading. (databricks.com, nallas.com)