Cohere Labs Releases 'Tiny Aya' Multilingual Model

Cohere Labs has released Tiny Aya, an open and efficient multilingual AI model. The model is intended to advance research and development within academic and non-profit communities. This release is part of a broader trend of major AI labs contributing smaller, specialized models to the open-source ecosystem.

- Tiny Aya is a 3.35 billion parameter, dense decoder-only Transformer model with an 8192 token context window; it utilizes Grouped Query Attention (GQA) and interleaved sliding window attention to optimize inference. - The model was trained on the Aya Collection, a massive dataset of 513 million prompts and completions created by 3,000 independent researchers across 119 countries, which includes human-curated data, templated data from fluent speakers, and machine-translated datasets covering 114 languages. - In performance benchmarks, Tiny Aya Global surpasses the larger GEMMA3-4B model in translation tasks for 46 out of 61 languages and achieves a 39.2% accuracy on the GlobalMGSM math benchmark for African languages, significantly outperforming GEMMA3-4B (17.6%). - The model family includes a base pretrained version, a globally instruction-tuned version, and three region-specific variants: "Earth" for African and West Asian languages, "Fire" for South Asian languages, and "Water" for Asia-Pacific and European languages. - Cohere emphasizes the model's efficiency, noting that the base models were trained on a relatively modest cluster of 64 H100 GPUs, making the technology more accessible for researchers and developers without massive compute resources. - A key architectural feature is its 262k token vocabulary, which was specifically designed for more equitable language representation, reducing the fragmentation of text across different scripts and thereby improving inference efficiency. - Tiny Aya is part of a broader "Aya" open-science initiative and model family from Cohere's non-profit research lab, which also includes larger models like Aya-101, Aya Expanse (8B and 32B parameters), and the multimodal Aya Vision. - To enhance performance in low-resource languages without "catastrophic forgetting" of global safety alignments, Cohere employed novel post-training techniques like FUSION, which aggregates responses from multiple 'teacher' models, and SimMerge, which merges regional model checkpoints with the global one.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.