Meta Unveils Llama 4 for Multimodal Social
Meta has unveiled Llama 4, its next-generation model engineered specifically for multimodal social features across text, image, and eventually video. By deploying it across Facebook, Instagram, and WhatsApp, Meta is setting a new baseline for user-facing AI where creative, context-aware content generation is the expectation. This raises the bar for engagement and retention in all consumer apps.
Llama 4’s architecture marks a significant shift from its predecessors by using a Mixture-of-Experts (MoE) design. This allows models like Llama 4 Maverick and Scout to be more efficient by activating only a fraction of their total parameters for any given task, a stark contrast to the dense transformer architecture of Llama 3. A key differentiator for developers is the massive 10 million token context window in Llama 4 Scout. This leap from Llama 3's ~128k token window enables applications that can reason over entire codebases or summarize lengthy documents without losing context. However, the model was trained on up to 265,000 tokens, so performance beyond that can vary. Unlike models where visual understanding is an add-on, Llama 4 is "natively multimodal," having been trained on text, images, and video from the ground up. This early fusion of data allows for a more unified and sophisticated understanding of mixed media, enabling it to analyze documents containing both text and images or explain visual content with greater context. Meta is directly challenging more expensive, closed models from competitors with its open-weight strategy. Benchmarks show Llama 4 Maverick outperforming models like OpenAI's GPT-4o and Google's Gemini 2.0 Flash in some key areas, offering startups frontier performance at a lower cost and without vendor lock-in. Within Meta's ecosystem, the AI assistant is now powered by Llama 4. On Instagram and WhatsApp, users can invoke Meta AI in chats by typing "@" to get answers or generate images in real-time as they type. A new AI Summaries feature has also been introduced to WhatsApp to quickly recap long group chats. For an engineer at a startup, choosing between Llama 3 and 4 involves trade-offs. While Llama 4 offers cutting-edge capabilities, Llama 3 can be more cost-effective, stable, and less resource-intensive to deploy, making it a more practical choice for many current production systems. The release continues Meta's commitment to an open AI ecosystem, making the weights for Scout and Maverick available for download. This comes with a licensing restriction requiring companies with over 700 million monthly active users to seek a commercial license from Meta.