CloudWatch adds Bedrock metrics
AWS pushed CloudWatch enhancements for Bedrock inference — now tracking TTFT (time-to-first-token) and other inference metrics to help measure real-world latency for generative models, plus new Application Signals SLO capabilities for service-level monitoring social post and social post. — Means you can now instrument Bedrock inference paths with SLOs and get granular latency signals for production LLM usage. — Useful if you run RAG or live inference pipelines on AWS and need observability tied to model responsiveness social post.
AWS published the announcement on March 10, 2026, in its What's New feed. aws.amazon.com The new metrics are emitted into the AWS/Bedrock CloudWatch namespace and are queryable from the CloudWatch console, CLI, and API for successful streaming model calls such as ConverseStream and InvokeModelWithResponseStream. aws.amazon.com One metric named EstimatedTPMQuotaUsage reports an estimated tokens‑per‑minute quota consumption to help teams forecast throttling and set CloudWatch alarms tied to quota burn. aws.amazon.com Amazon CloudWatch Application Signals added Bedrock integration in August 2024 and AWS published guidance showing how Application Signals can be used to create SLOs that correlate model dimensions (ModelId, AgentAliasArn) with traces and logs. aws.amazon.com