Azure OpenAI token billing mismatch

- A new April 27 troubleshooting post said Azure OpenAI Realtime sessions can show token counts in `response.done` that do not match Azure billing meters. - The writeup says single-turn voice tests saw audio input billed at nearly six times expected levels and text input meters run up to three times higher. - Microsoft docs say Cost Management data can lag and cached tokens get discounted pricing. (learn.microsoft.com)

Azure OpenAI’s Realtime API can report one token total to an app and a different one to Azure Cost Management for the same voice turn. (technetexperts.com) (learn.microsoft.com) The issue surfaced in an April 27, 2026 post built around `gpt-realtime-mini-2025-12-15`, where a single 11-second voice request produced meter values that diverged from the `response.done` usage object. The author said audio input was billed at nearly 6x the expected level and text input at up to 3x. (technetexperts.com) A Microsoft Q&A thread dated March 31, 2026 describes the same pattern in a stripped-down test: one audio file, no tools, no custom system prompt, server voice activity detection enabled, and Whisper transcription turned on. (learn.microsoft.com) The basic split is between model telemetry and service billing. `response.done` reports what the language model says it used for that turn, while Azure billing can include work done by the broader audio pipeline that sits around the model. (technetexperts.com) That matters for teams building call centers, meeting assistants, and voice bots on Azure’s Realtime stack, which Microsoft pitches for low-latency audio over WebRTC, Session Initiation Protocol, and WebSocket connections. (learn.microsoft.com) The biggest extra cost in the writeup is transcription. When developers enable `RealtimeAudioTranscriptionOptions` with `whisper-1`, the audio stream is sent through a second path to produce text, even if the transcription event itself shows `seconds: 0` in usage. (technetexperts.com) (learn.microsoft.com) The post also argues Azure injects internal safety and alignment instructions that are billed but not exposed in the app-visible token count. That claim is not spelled out in Microsoft’s public pricing docs, but it matches the broader distinction Microsoft makes between usage context in Foundry and final charges in Cost Management. (technetexperts.com) (learn.microsoft.com) Caching adds another wrinkle. Microsoft says cached tokens can be billed at a discount on Standard deployments and up to a 100% discount on input tokens for Provisioned deployments, so app-side token math and invoice math will not always line up one-for-one. (learn.microsoft.com) Microsoft also warns against minute-by-minute reconciliation. Its Cost Management docs say usage records can appear with delay, and its Foundry cost guidance tells customers to compare costs over trend windows instead of real-time snapshots. (learn.microsoft.com 1) (learn.microsoft.com 2) The practical fix in the post is to treat `response.done` as application telemetry and Azure Cost Management as the billing source of truth, then reconcile by meter over time instead of per event. For finance teams and engineers sharing one dashboard, that is the difference between a usage chart and a bill. (technetexperts.com) (learn.microsoft.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.