Datadog partners with Google Cloud for GPUs
- Datadog said April 22 its GPU Monitoring product is generally available, and pitched it alongside Google Cloud tools for AI agents, GPUs, and TPUs. - Google Cloud and Datadog said the stack now spans Google’s Agent Development Kit, Vertex AI, Cloud TPU, and Datadog GPU fleet health telemetry. - The push extends a broader Google Cloud AI observability tie-up Datadog outlined in 2025. (cloud.google.com)
Graphics processing units, or GPUs, are the chips that train and run many artificial intelligence systems, and Datadog is now pushing deeper into watching how those chips are used on Google Cloud. On April 22, Datadog said its GPU Monitoring product is generally available and tied the launch to Google Cloud AI tooling at Google Cloud Next 2026. (datadoghq.com) (markets.businessinsider.com) Datadog’s pitch is that companies building AI apps and agents on Google Cloud want one place to track prompts, tool calls, latency, token use, infrastructure health, and cost. Its April 22 post said that includes optimizing cost and performance across both GPUs and Google’s tensor processing units, or TPUs. (datadoghq.com) For agents, which are software systems that take multiple steps and call tools on their own, Datadog said its LLM Observability product now supports auto-instrumentation for Google’s Agent Development Kit. Google said in a January post that Datadog LLM Observability can automatically instrument systems built with that kit. (datadoghq.com) (cloud.google.com) For the infrastructure underneath, Datadog said GPU Monitoring links device health, cost, and performance to the workloads and teams using those chips. The company said platform teams can break down total and idle GPU cost by tag, forecast demand, and spot stalled jobs, thermal throttling, and ECC or XID hardware errors. (datadoghq.com) Datadog framed the money problem in blunt terms. In its April 22 announcement, Chief Product Officer Yanbing Li said GPU instances account for 14 percent of compute costs, and said many companies still cannot charge back GPU spending across business units or see which workloads are wasting capacity. (markets.businessinsider.com) The Google Cloud angle is not a brand-new standalone partnership announced this week so much as an expansion of an existing alliance around AI observability. Google Cloud wrote in June 2025 that Datadog had expanded monitoring across Vertex AI Agent Engine, Gemini models in Vertex AI, Cloud TPU, BigQuery, and GPU fleets running on Google Cloud and elsewhere. (cloud.google.com) That earlier Google post also said Datadog’s GPU Monitoring product had been announced at DASH 2025, before this week’s broader availability push. The new April 2026 messaging packages that GPU view with agent observability and security tooling as Google Cloud markets what it calls an “agentic enterprise.” (cloud.google.com) (blog.google) The companies are also using the conference moment to highlight the relationship itself. Datadog said it won two 2026 Google Cloud Partner of the Year awards, in AIOps Technology and Infrastructure Modernization Marketplace, ahead of Google Cloud Next. (datadoghq.com) (cloud.google.com) So the clearest news here is not a fresh merger-style deal. It is Datadog taking a GPU monitoring product it had already previewed, making it broadly available on April 22, and slotting it into a wider Google Cloud stack for agents, models, accelerators, and cost controls. (markets.businessinsider.com) (datadoghq.com)