Databricks embeds OpenTelemetry tracing

- Databricks said on May 22 it added managed OpenTelemetry tracing to Unity Catalog and introduced automatic prompt caching for open-source LLM workloads. - Databricks said prompt caching on GPT-OSS cut P50 latency by 3x and increased throughput by 2.5x in production tests. - Databricks published the features in product blogs and documentation on May 22, with Unity Catalog storage and SQL querying available now.

Databricks used a pair of product updates on May 22 to push more of the operational AI stack into its own platform. The company said AI agents can now write OpenTelemetry traces directly into Unity Catalog tables through a managed ingestion path, and that hosted open-source models on Databricks now support automatic prompt caching. The releases tie observability, governance and inference efficiency more tightly to Unity Catalog and Databricks Model Serving. Databricks described both features as aimed at production deployments rather than prototype workflows. ### Why is Databricks putting OpenTelemetry traces into Unity Catalog? Databricks said traditional observability tools struggle with the volume, retention cost and governance needs of AI-agent trace data. In its May 22 blog post, the company said customers can now write OpenTelemetry, or OTel, traces directly into Unity Catalog tables through a fully managed, serverless ingestion path. (databricks.com) Unity Catalog is Databricks’ governance layer, and the company said storing traces there lets teams apply table permissions, retain large trace volumes in Delta tables, and query them with Databricks SQL. Databricks documentation says OTel-formatted storage also improves compatibility with external systems and tools. ### What changes for teams building agents? (databricks.com) Databricks said the tracing setup is designed to capture agent behavior across frameworks, tools and execution environments, rather than only inside a single Databricks-native workflow. The company’s blog said traces landed in Unity Catalog can be used for debugging, evaluation, monitoring and analytics workflows. (docs.databricks.com) A separate Databricks post on May 19 said Unity Catalog and Unity AI Gateway already log request metadata including token counts, latency, requester identity and model destination across Databricks-hosted and external providers. Read together, the new tracing feature extends that operating model from usage logs into richer execution traces, including tool activity. That is an inference from the company’s product posts and documentation, not a separately stated claim. (databricks.com) ### How does the prompt caching piece work? Databricks said prompt caching reuses key-value caches for identical prompt prefixes, so the model can skip recomputing the prefill stage on repeated requests. In its May 22 post, the company said the feature is enabled automatically for open-source models across batch, pay-per-token and provisioned workloads, with no additional configuration required. (databricks.com) The company said the feature is intended for workloads where the same system prompt or instruction prefix appears repeatedly, such as chat applications, document processing pipelines and AI agents. Databricks said cache hits lower latency and raise throughput because more tokens can be processed per model unit. (databricks.com) ### What performance numbers did Databricks publish? Databricks said that, in production on GPT-OSS, prompt caching increased throughput by 2.5x and reduced P50 latency by 3x. The company did not present those figures in the search snippets with a separate third-party benchmark, but published them in its own product announcement. (databricks.com) Databricks also said the caching rollout covers hosted open-source model workloads on its serving platform. Its framing was cost and speed: faster responses, better model-unit utilization and lower compute waste when prompt prefixes repeat. ### Where does this fit in Databricks’ broader AI control layer? Databricks documentation published in May says Databricks Apps telemetry already collects traces, logs and metrics using OpenTelemetry and persists them to Unity Catalog tables. (databricks.com) Another documentation page says users can configure standard OpenTelemetry SDKs and collectors to push traces, logs and metrics directly into Unity Catalog Delta tables without custom libraries. Those pieces show Databricks moving OTel beyond app-level telemetry into a broader platform pattern anchored in Unity Catalog. The next step for users is operational rather than ceremonial: Databricks’ documentation says teams can store traces in Unity Catalog now, query them through Databricks SQL, and use automatic prompt caching on supported open-source model-serving workloads. (docs.databricks.com 1) (docs.databricks.com 2)

Databricks embeds OpenTelemetry tracing

Get your own daily briefing