New AI Models Show 'Persistent Memory'

The next wave of AI capabilities is here, focusing on real-time interaction and long-term context. New systems, including the Qwen 3.5 model, feature persistent memory to remember user preferences across sessions. This enables more personalized outputs and seamless completion of multi-step tasks, moving beyond single-shot commands.

The efficiency of models like Qwen 3.5 stems from a "Mixture-of-Experts" (MoE) architecture. Instead of using its entire 397-billion parameters for every task, it activates only a fraction—about 17 billion—by routing tasks to specialized sub-networks. This approach dramatically reduces computational cost and increases speed, delivering up to 19 times faster performance on long-context tasks compared to previous versions. Qwen 3.5's ability to handle massive amounts of information—natively supporting 262,144 tokens and extendable to over 1 million—is powered by Gated Delta Networks. This is a form of linear attention that avoids the exponential increase in computational requirements that affects traditional transformer architectures. This mechanism allows the model to efficiently process long documents, videos, and entire codebases within a single session. The term "persistent memory" in this context refers to an ultra-long context window, not a permanent memory of all past interactions. The model remembers information for the duration of a single, extended session. True cross-session memory, where an AI remembers you from one day to the next, typically requires a different technique called Retrieval-Augmented Generation (RAG). Retrieval-Augmented Generation (RAG) gives AI a long-term memory by connecting it to an external knowledge base, like a vector database. Instead of holding all information in its active memory, the AI retrieves relevant facts from this database when needed. This is how many AI agents achieve personalization and learn from past interactions over time. The push for massive context windows has become a key battleground. While Qwen 3.5-Plus offers a 1-million token window, it competes with models like Google's Gemini 1.5 Pro, which also boasts a 1-million token capacity. Anthropic's Claude 3.5 Sonnet, by comparison, has a 200,000 token context window. Beyond just text, Qwen 3.5 is a native multimodal model, trained from the ground up on text, images, and video simultaneously. This allows it to understand and reason across different data types within the same input, a crucial capability for developing more sophisticated "agentic" AI that can interact with user interfaces and applications.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.