Data Contracts in CI/CD Pipelines Gain Traction

Data governance is increasingly being integrated into CI/CD pipelines through the use of enforceable data contracts. An engineer noted that this approach can automatically block code merges if data quality or schema checks fail. This method is seen as essential for building trust in data within regulated environments.

- The concept of data contracts was created by Andrew Jones to address upstream data changes that unexpectedly broke downstream processes while he was at the financial tech company GoCardless. - In practice, this "shift-left" approach moves data validation into the CI/CD pipeline, turning what would be a runtime incident into a preventable build failure. - Data contracts are foundational to the Data Mesh architecture, serving as the enforceable agreement that defines the structure, quality, and service-level objectives for a domain's "data products". - While schema validation checks the structure of data, a data contract is a broader, machine-readable agreement that also defines semantics, data quality rules, SLAs, and ownership. - For dbt users, model contracts are defined in YAML files and validate the shape of a model's output *before* it is built; this is distinct from dbt tests, which validate the data *after* the model has been created. - The Open Data Contract Standard (ODCS), hosted by the Linux Foundation, is an emerging open-source specification for defining contracts in a vendor-agnostic way. - In regulated industries like healthcare, data contracts help enforce governance by codifying rules for data access, privacy (like HIPAA), and the handling of sensitive information. - The ecosystem of enforcement tools includes schema registries like Confluent's for Kafka, dbt's native contracts, and data quality frameworks like Great Expectations.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.