Microsoft launches MAI-Transcribe-1
Microsoft rolled out MAI-Transcribe-1, a new speech-to-text model claiming industry-leading accuracy across 25 languages — now available to commercial users and flagged as a major move into AI audio tools for creators ( ). The push includes broader AI-generated audio and voice-cloning capabilities, signaling faster access to high-quality transcription and voice tech for producers and mastering engineers (geekwire.com).
Microsoft posted the MAI-Transcribe-1 announcement on April 2, 2026 and listed the model as available through Microsoft Foundry and the MAI Playground. (microsoft.ai) The company said the model achieves the lowest Word Error Rate on the FLEURS benchmark, explicitly outperforming Scribe v2, Whisper-large-v3, GPT-Transcribe and Gemini 3.1 Flash-Lite. (microsoft.ai) Microsoft reported that MAI-Transcribe-1 delivers batch transcription speeds 2.5x faster than its existing Azure “Fast” option. (microsoft.ai) Pricing for the transcription model starts at $0.36 per hour of audio on Microsoft Foundry, and Microsoft said the MAI models are being rolled into the MAI Playground for testing. (techcommunity.microsoft.com) The MAI-Transcribe-1 entry is currently offered in public preview and Microsoft’s documentation warns the preview is not covered by an SLA, with technical limits such as lack of diarization, a 300 MB audio-file size cap, and accepted formats of WAV, MP3 and FLAC. (learn.microsoft.com) The launch was packaged with MAI-Voice-1, which Microsoft says can produce a full minute of audio in under one second on a single GPU and is priced starting at $22 per 1 million characters. (techcommunity.microsoft.com)