New Voice and Multilingual Models Go Open and Local

A wave of new models is making advanced voice and multilingual AI more accessible for on-device applications. Cohere released Tiny Aya, a family of open-weight models supporting over 70 languages on consumer hardware. In India, Gnani.ai unveiled Inya VoiceOS, a 5B parameter voice-to-voice model that bypasses traditional text pipelines for lower latency. Concurrently, open-source projects like Voicebox are enabling professional-grade voice cloning to run entirely offline.

- Cohere's Tiny Aya achieves its small footprint through a 3.35B parameter dense decoder-only Transformer architecture and can run on an iPhone 17 Pro at 32 tokens per second with 4-bit quantization. The model was trained efficiently on a relatively modest cluster of 64 H100 GPUs. - To improve performance in underserved languages, Cohere developed specialized regional versions like TinyAya-Fire for South Asian languages and TinyAya-Earth for African languages, a process involving merging region-specific models with a global one. - Inya VoiceOS is trained on a massive sovereign dataset for Indian languages, including over 14 million hours of multilingual speech data and 8 trillion text tokens for linguistic grounding. This allows it to handle code-mixed speech and preserve paralinguistic cues like emotion and tone. - Gnani.ai's voice-to-voice model is part of the India AI Mission and was released as a research preview ahead of a planned, more powerful 14-billion-parameter version. - The open-source Voicebox project is positioned as a local, free alternative to services like ElevenLabs, running entirely offline for privacy. It is built with a Rust-based backend for native performance, avoiding Electron, and leverages the Qwen3-TTS model for high-fidelity voice cloning from short audio samples. - The trend towards on-device AI is driven by practical needs for lower latency, improved privacy, and reduced costs, shifting inference and personalization from the cloud to user hardware. This enables new applications like real-time, on-device translation in remote areas. - For engineers exploring career paths, startups offer opportunities for rapid skill growth, greater autonomy, and potentially high rewards through equity, but come with less stability and more unstructured environments compared to big tech. - The rise of powerful, open-weight models that can run locally is a key enabler for startups, allowing them to integrate advanced AI features like multilingual support or sophisticated voice interfaces without the high cost of cloud-based proprietary models.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.