QueryData: Intent → Query

Conversations show interest in tools that translate a user’s intent into optimized database queries, effectively hiding SQL complexity behind natural language. Technologists and demos flagged intent‑to‑query approaches as a way to generate more efficient, schema‑aware queries for analytics and app‑level reporting (x.com) (x.com) (x.com).

A growing class of data tools is trying to turn plain-English intent into database queries, so users ask for “weekly revenue by region” instead of writing SQL. (spider2-sql.github.io) The basic idea is old enough to have a name — natural-language-to-SQL, or text-to-SQL — but the newer pitch is narrower and more practical. Instead of asking a model to guess a whole database from scratch, vendors and developers feed it schema details, business metrics, and guardrails before it writes a query. (techcommunity.microsoft.com) (docs.getdbt.com) That extra context matters because real company databases are messy. The Spider 2.0 benchmark, a 632-task test built from enterprise-style workflows, found that systems that look strong on older academic text-to-SQL tests still struggle on real-world jobs involving multiple SQL steps, varied database systems, and ambiguous business logic. (spider2-sql.github.io) (arxiv.org) Researchers behind Spider 2.0 reported that their code-agent framework solved 21.3% of tasks on Spider 2.0, versus 91.2% on Spider 1.0 and 73.0% on BIRD. That gap has pushed builders away from “ask anything” demos and toward systems that know a company’s tables, joins, and metric definitions in advance. (arxiv.org) One route is the semantic layer, which acts like a shared dictionary for business data. dbt says its Semantic Layer, powered by MetricFlow, lets teams define metrics such as revenue once, handles joins automatically, and then exposes those governed definitions to downstream tools and applications. (docs.getdbt.com 1) (docs.getdbt.com 2) Another route is stricter output control around the model itself. OpenAI’s Structured Outputs feature is designed to make model responses match a developer-supplied JSON schema, which is useful when an application needs the model to return a constrained query plan, filters, or tool call instead of free-form text. (developers.openai.com) (openai.com) Cloud vendors are framing the same problem in retrieval terms: first find the right schema fragments, then generate the query. Microsoft’s November 20, 2024 guidance for Azure AI Search said natural-language-to-SQL systems work better when they retrieve relevant schema context, rank tables and columns, and narrow the model’s search space before generation. (techcommunity.microsoft.com) Google has described a similar pattern for BigQuery, pairing large language models with database metadata and rules for SQL generation. In practice, that means the “intent → query” layer is less about replacing databases and more about translating a user request into a query the warehouse can actually run. (cloud.google.com) The current wave of interest is landing in analytics and app reporting, where the questions repeat and the acceptable metrics are already defined. That is a better fit than open-ended data exploration, because the tool can map “active users,” “net revenue,” or “this quarter” to company-specific definitions instead of inventing them. (docs.getdbt.com 1) (docs.getdbt.com 2) The result is a quieter shift than the chatbot demos suggest: the useful product is often not a model that “knows SQL,” but a system that knows your schema, your metrics, and when not to guess. (arxiv.org) (techcommunity.microsoft.com)

QueryData: Intent → Query

Get your own daily briefing