Google Cloud pairs with Datadog for GPU

- Google Cloud and Datadog said Datadog now spans Google’s AI stack, adding monitoring for Vertex AI Agent Engine, Cloud TPU, and GPU fleets. - Datadog said its AI Agents Console can trace agents on Vertex AI Agent Engine, while GPU Monitoring links accelerator health, usage, and spend. - The push extends a 2024 Vertex AI tie-up as Google pitches “agentic” enterprise tooling. (cloud.google.com)

Google Cloud and Datadog expanded their AI partnership, with Datadog adding monitoring for Vertex AI Agent Engine, Cloud TPU, and GPU fleets on Google Cloud. (cloud.google.com) The companies outlined the update in a Google Cloud blog post published in June 2025, saying Datadog’s AI Agents Console now supports agents deployed through Vertex AI Agent Engine. The same post said Datadog’s GPU Monitoring product and Cloud TPU integration give Google Cloud users more visibility into accelerator bottlenecks and underused capacity. (cloud.google.com) In plain terms, observability software is the dashboard for AI systems: it shows what an agent did, which tool it called, how long it took, and what hardware it burned through. Datadog said its tooling can trace agent steps on a single timeline and connect those traces to infrastructure metrics and cloud cost data. (cloud.google.com) (datadoghq.com) That matters because generative artificial intelligence workloads are expensive in two places at once: model calls and accelerator time. Datadog said GPU Monitoring links device health, cost, and performance to the workloads and teams using them, while its setup docs say Google Cloud integration can add cloud cost and instance-type information. (datadoghq.com) (docs.datadoghq.com) Datadog separately made GPU Monitoring generally available on April 22, 2026, saying the product is meant to help customers manage expanding AI costs as projects scale. The company said the tool gives a unified view across shared GPU fleets so teams can plan capacity, troubleshoot failures, and reduce wasted spend. (markets.businessinsider.com) (datadoghq.com) Google has been building toward this in stages. In 2023, Google said its Ops Agent added NVIDIA GPU telemetry for Compute Engine virtual machines, and in 2023 Google also said Datadog had become the first full observability product for Vertex AI model monitoring. (cloud.google.com 1) (cloud.google.com 2) The newer piece is agent monitoring, which focuses less on a single model response and more on a chain of actions. Google said Datadog’s Agent Development Kit integration can automatically instrument and trace agent orchestration, planner choices, and tool calls without code changes. (cloud.google.com) Google Cloud Next 2026 put “agentic” software at the center of Google’s pitch to enterprise customers, alongside new Tensor Processing Units and agent platforms. In that context, Datadog’s role is to give customers one place to watch the agents, the chips, and the bill. (www.infoworld.com) (www.crn.com) The tie-up is less a brand-new April 2026 deal than a broader rollout coming into focus as Datadog’s GPU product reaches general availability and Google sharpens its enterprise AI stack. The story in 2026 is that cloud vendors now want customers to measure model behavior and hardware efficiency in the same pane of glass. (cloud.google.com) (markets.businessinsider.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.