Tiny, cheap data stack

Simon Späti posted a full open‑source stack for mid‑scale companies—Dagster + DuckDB + dbt + Airbyte—positioned as a low‑cost path for data engineering. (x.com) That combination gives ingestion, in‑process analytics, transformation and orchestration without heavy cloud lock‑in, which is attractive for teams trying to keep costs predictable. (x.com)

A lot of data teams still build like they’re running a Fortune 500 company: one tool to pull data, one warehouse to store it, one scheduler to run jobs, and a monthly cloud bill that grows faster than the dashboards. Simon Späti’s post landed because it showed a smaller setup that can do the same basic jobs with open-source parts instead of a giant platform. (x.com) The first piece is Airbyte, which is the part that goes out and fetches data from other systems. Airbyte’s own docs describe it as the layer that extracts from sources and loads into a destination, using prebuilt connectors instead of custom scripts for every application programming interface. (docs.airbyte.com) The second piece is DuckDB, which is a database that runs inside the same process as your code instead of sitting on a separate server. DuckDB says it is an “in-process” analytics database that can run on a laptop, a server, or even in a browser, and query files like Parquet and JSON directly. (duckdb.org) That changes the cost shape. A team can analyze a few gigabytes or tens of gigabytes with one binary and local or attached storage, instead of paying for a warehouse that is always on whether anyone is querying it or not. (duckdb.org) The third piece is dbt, short for data build tool, which takes raw tables and turns them into cleaned, tested models written in Structured Query Language. dbt’s docs frame it as a way to build transformation logic with software engineering habits like version control, testing, and documentation. (docs.getdbt.com) The fourth piece is Dagster, which is the traffic controller that decides what runs, in what order, and on what schedule. Dagster’s dbt integration works at the level of individual dbt models, so a team can run only the pieces that changed and track failures model by model. (docs.dagster.io) Put together, the stack works like this: Airbyte copies raw data in, DuckDB stores and queries it, dbt reshapes it into analysis-ready tables, and Dagster tells each step when to fire. Airbyte and Dagster have documented integrations for triggering syncs and orchestrating those jobs alongside downstream transformations. (docs.airbyte.com) (docs.dagster.io) This is not a brand-new idea. Airbyte has published “DAD Stack” guides for Airbyte, dbt, and Dagster for more than two years, usually with a cloud warehouse like BigQuery or Snowflake at the center, and the newer twist is swapping that expensive center for DuckDB. (airbyte.com 1) (airbyte.com 2) DuckDB is what makes the pitch feel timely in 2026. DuckDB published an April 4, 2025 post showing fully local transformation pipelines with the dbt-duckdb adapter, which gave teams a documented path to run dbt on DuckDB without renting a separate warehouse first. (duckdb.org) The sweet spot is not a two-person startup with one spreadsheet, and it is not a bank processing petabytes every hour. It is the mid-size company that has enough data to need reliable pipelines, but not enough scale to justify Snowflake, Databricks, or a full platform team every time finance asks why the analytics bill doubled. (duckdb.org) (docs.dagster.io) (docs.getdbt.com) The tradeoff is that cheap and simple only stays cheap and simple up to a point. Once a team needs always-on concurrency, strict access controls, many business users hitting the same system at once, or very large distributed workloads, the tiny stack stops being tiny and starts needing the heavier warehouse it was designed to avoid. (duckdb.org) (docs.getdbt.com)

Tiny, cheap data stack

Get your own daily briefing