Hume's EVI 3 voice in 1.2s
- Hume AI introduced EVI 3 on May 29, 2025, pitching a speech-to-speech model that handles listening, language, and speaking in one system and can generate any prompted voice or personality. - In July, Hume said EVI 3’s API added voice cloning from 30 seconds of audio or less and support for more than 200,000 designed voices on its platform. - The model sits in a fast-moving voice AI race with OpenAI and Google, as companies push lower-latency, more lifelike assistants for calls, support, and personal agents. (hume.ai)
Voice AI is software that listens, decides what to say, and speaks back. Hume AI says its EVI 3 model does all three in one system instead of chaining separate transcription, language, and text-to-speech tools. (hume.ai 1) (hume.ai 2) Hume introduced EVI 3 on May 29, 2025, calling it its third-generation speech-language model. The company said the model can “speak with any voice and personality you create with a prompt” and respond at conversational latency. (hume.ai) That architecture matters because delay is the part users notice first in live conversation. Hume says EVI 3 uses one model for both language and speech so it can answer faster than systems that split language generation from text-to-speech. (hume.ai 1) (hume.ai 2) The company’s pitch is not just speed. Hume says EVI 3 tracks prosody — the tune, rhythm, and timbre in a person’s voice — so it can decide when to speak and shape its own tone in response. (hume.ai) In Hume’s framing, that makes the system useful for customer service, accessibility, robotics, gaming, and personal assistants. The documentation says EVI can stop when a user interjects and resume with context, which is a basic requirement for call and meeting use. (hume.ai) Hume also tied EVI 3 to custom voice creation. In its May launch post, the company said users could speak to more than 100,000 custom voices already created on its text-to-speech platform. (hume.ai) On July 17, 2025, Hume expanded that pitch with an API release. The company said EVI 3 could speak expressively with “any voice, real or designed, without fine-tuning,” and that developers could use more than 200,000 designed voices through the platform. (hume.ai) That same API launch added voice cloning. Hume said EVI 3 could capture a speaker’s timbre, accent, rhythm, tone, and parts of personality from 30 seconds of audio or less. (hume.ai) Hume also said EVI 3 can work with outside language models while it is already speaking. Its launch materials describe parallel connections to reasoning models and web search systems, with later answers merged into quicker spoken responses. (hume.ai 1) (hume.ai 2) The company positioned EVI 3 directly against other live voice systems. Hume said a blind comparison rated EVI 3 above OpenAI’s GPT-4o on empathy, expressiveness, naturalness, interruption quality, response speed, and audio quality. (hume.ai) By April 2026, Hume’s public documentation shows EVI 3 still available, while a newer EVI 4-mini supports 11 languages and requires a supplemental language model for some capabilities. EVI 3 remains the version with “quick responses” in Hume’s own comparison table. (hume.ai) The open question is not whether voice AI can sound more like a person; Hume’s product pages already assume that. The question is how companies deploy fast voice cloning and tone-matching in calls, support lines, and assistants without losing control of consent, identity, and misuse. (hume.ai) (hume.ai)