dbt Formalizes 'State-Aware Orchestration'
dbt is formalizing a major efficiency feature called state-aware orchestration (SAO). The new approach ensures that dbt jobs automatically determine which models need to be built based on code or data changes, eliminating brute-force rebuilds. It's a significant step toward smarter, faster, and cheaper data pipelines.
This move from stateless to stateful job execution is powered by the dbt Fusion engine. Previously, dbt runs were stateless, rebuilding all models in a directed acyclic graph (DAG) regardless of whether their inputs had changed. Now, dbt maintains a real-time fingerprint of both model code and the upstream data state to pinpoint exactly what needs to be refreshed. The new orchestration goes beyond dbt Core's `state:modified+` selector, which compared a project's current state to a stored manifest from a previous run. State-aware orchestration uses a shared, real-time model state across all jobs in an environment, enabling it to detect upstream data changes at runtime and handle concurrent jobs without collisions. This prevents two jobs from building the same model simultaneously, with one job waiting for the other to finish. This capability is an evolution of the "Slim CI" concept, which focused on running only modified models during continuous integration to save time and compute. While Slim CI relies on comparing a pull request to a production manifest, state-aware orchestration provides a more dynamic, real-time check for both code and data changes in any job, not just CI. By understanding warehouse metadata to see when source tables were last modified, dbt can skip rebuilding models where upstream data is unchanged. This intelligence removes the need for brittle, time-based scheduling (e.g., running a job every hour) and allows for more intent-based configurations, such as specifying data freshness requirements directly in the project. Early customer feedback indicates an average of 10% cost savings just by enabling the feature. This shift towards more intelligent and decentralized execution aligns with broader architectural patterns like Data Mesh. By enabling more autonomous, efficient, and reliable data products, state-aware orchestration is a key technical enabler for domain-owned data models within a larger, interconnected dbt Mesh architecture. This allows organizations to scale analytics engineering by breaking down monolithic dbt projects.