LLMs meet query planning

Published April 23, 2026 by The Daily Scout

- Databricks published research evaluating whether LLM agents can optimise database join order decisions. - The work applies frontier-model experimentation to a classic database performance problem: join-order optimisation. - The research signals growing interest in applying model intelligence to infrastructure and operational tuning beyond end-user interfaces. (databricks.com)

Why it matters

A database can answer the same SQL question in many ways, and Databricks says a prototype large language model agent picked faster join orders in most of its April 22 tests. (databricks.com) Join order is the sequence a database uses to combine tables, and that choice can change runtime sharply when a query touches many datasets. Databricks said the number of possible plans grows exponentially with the number of tables, and analytics queries often join 20 to 30 tables. (databricks.com) Traditional query planners already try to solve that search problem with statistics and cost estimates, but major databases still expose knobs for cases where the default plan is not optimal. PostgreSQL’s current documentation says users sometimes improve plans by changing planner cost constants or collecting better statistics, and Databricks’ own docs say its optimizer can struggle on queries with many joins and aggregations. (postgresql.org) (docs.databricks.com) Databricks did not put an LLM in the hot path of every query. Its prototype agent worked offline in a loop: propose a join order, run the query, inspect post-execution statistics, and try again until it hit an iteration limit. (databricks.com) (ucbskyadrs.github.io) In the company’s reported benchmark results, the agent beat the Databricks optimizer in 80% of cases and improved query latency by 1.3x overall. The follow-up writeup from the AI-Driven Research for Systems group said the agent typically searched for 10 to 20 iterations before outperforming the default plan on the Join Order Benchmark, or JOB. (databricks.com) (ucbskyadrs.github.io) (github.com) The benchmark itself is not a new Databricks invention. JOB is a long-running public benchmark built around SQL queries from the paper “How Good Are Query Optimizers, Really?”, and it has been widely used to compare join-order systems. (github.com 1) (github.com 2) That puts the new work in a familiar database tradition: researchers have already tried Bayesian optimization, deep reinforcement learning, and other search methods for join ordering. The Databricks and University of Pennsylvania collaboration shifts that contest toward frontier language models that can read query structure, look at runtime feedback, and keep revising plans. (speculative.tech) (vldb.org) (databricks.com) The company’s follow-up analysis, published April 16 by the University of California, Berkeley-linked ADRS group, said the agent’s traces showed tactics such as picking an anchor table, breaking queries into clusters, and testing hypotheses against execution results. That is closer to an automated database administrator running experiments than to a chatbot writing SQL. (ucbskyadrs.github.io) The immediate takeaway is narrower than “LLMs replace query optimizers.” Databricks tested an offline agent on a classic planning problem, and the result points to a model-assisted tuning workflow for infrastructure software that still depends on real executions, statistics, and existing database engines. (databricks.com) (postgresql.org)

Key numbers

(databricks.com) A database can answer the same SQL question in many ways, and Databricks says a prototype large language model agent picked faster join orders in most of its April 22 tests.
Databricks said the number of possible plans grows exponentially with the number of tables, and analytics queries often join 20 to 30 tables.
(databricks.com) (ucbskyadrs.github.io) In the company’s reported benchmark results, the agent beat the Databricks optimizer in 80% of cases and improved query latency by 1.3x overall.
The follow-up writeup from the AI-Driven Research for Systems group said the agent typically searched for 10 to 20 iterations before outperforming the default plan on the Join Order Benchmark, or JOB.

What happens next

Databricks said the number of possible plans grows exponentially with the number of tables, and analytics queries often join 20 to 30 tables.
(databricks.com) Traditional query planners already try to solve that search problem with statistics and cost estimates, but major databases still expose knobs for cases where the default plan is not optimal.
PostgreSQL’s current documentation says users sometimes improve plans by changing planner cost constants or collecting better statistics, and Databricks’ own docs say its optimizer can struggle on queries with many joins and aggregations.

Sources

Quick answers

What happened in LLMs meet query planning?

Databricks published research evaluating whether LLM agents can optimise database join order decisions. The work applies frontier-model experimentation to a classic database performance problem: join-order optimisation. The research signals growing interest in applying model intelligence to infrastructure and operational tuning beyond end-user interfaces. (databricks.com)

Why does LLMs meet query planning matter?

A database can answer the same SQL question in many ways, and Databricks says a prototype large language model agent picked faster join orders in most of its April 22 tests. (databricks.com) Join order is the sequence a database uses to combine tables, and that choice can change runtime sharply when a query touches many datasets. Databricks said the number of possible plans grows exponentially with the number of tables, and analytics queries often join 20 to 30 tables. (databricks.com) Traditional query planners already try to solve that search problem with statistics and cost estimates, but major databases still expose knobs for cases where the default plan is not optimal. PostgreSQL’s current documentation says users sometimes improve plans by changing planner cost constants or collecting better statistics, and Databricks’ own docs say its optimizer can struggle on queries with many joins and aggregations. (postgresql.org) (docs.databricks.com) Databricks did not put an LLM in the hot path of every query. Its prototype agent worked offline in a loop: propose a join order, run the query, inspect post-execution statistics, and try again until it hit an iteration limit. (databricks.com) (ucbskyadrs.github.io) In the company’s reported benchmark results, the agent beat the Databricks optimizer in 80% of cases and improved query latency by 1.3x overall. The follow-up writeup from the AI-Driven Research for Systems group said the agent typically searched for 10 to 20 iterations before outperforming the default plan on the Join Order Benchmark, or JOB. (databricks.com) (ucbskyadrs.github.io) (github.com) The benchmark itself is not a new Databricks invention. JOB is a long-running public benchmark built around SQL queries from the paper “How Good Are Query Optimizers, Really?”, and it has been widely used to compare join-order systems. (github.com 1) (github.com 2) That puts the new work in a familiar database tradition: researchers have already tried Bayesian optimization, deep reinforcement learning, and other search methods for join ordering. The Databricks and University of Pennsylvania collaboration shifts that contest toward frontier language models that can read query structure, look at runtime feedback, and keep revising plans. (speculative.tech) (vldb.org) (databricks.com) The company’s follow-up analysis, published April 16 by the University of California, Berkeley-linked ADRS group, said the agent’s traces showed tactics such as picking an anchor table, breaking queries into clusters, and testing hypotheses against execution results. That is closer to an automated database administrator running experiments than to a chatbot writing SQL. (ucbskyadrs.github.io) The immediate takeaway is narrower than “LLMs replace query optimizers.” Databricks tested an offline agent on a classic planning problem, and the result points to a model-assisted tuning workflow for infrastructure software that still depends on real executions, statistics, and existing database engines. (databricks.com) (postgresql.org)