Apple Prepping Major Siri AI Overhaul for April

Apple is reportedly preparing a major generative AI upgrade for Siri to be released by April and showcased at WWDC 2026. The overhaul aims to enable more natural, context-aware conversations and automate multi-step tasks, potentially unlocking powerful new API capabilities for developers.

The upcoming Siri capabilities are powered by a family of foundation models, including a ~3 billion parameter on-device version optimized for Apple silicon. This on-device model uses innovations like a shared KV-cache and 2-bit quantization-aware training to run efficiently, achieving a generation rate of 30 tokens per second on an iPhone 15 Pro. For more complex requests, the system offloads tasks to a larger server-side model running on Private Cloud Compute. This server-side model utilizes a novel Parallel-Track Mixture-of-Experts (PT-MoE) transformer architecture. This design combines sparse computation with both global and local attention mechanisms to deliver higher quality results efficiently. The entire backend infrastructure runs on custom-built server hardware featuring Apple silicon, including the Secure Enclave, to extend device-level security into the data center. To ensure user privacy, the Private Cloud Compute (PCC) architecture is designed to be stateless. User data is only used ephemerally to fulfill a request and is never stored or made available to Apple personnel. The servers run a hardened, minimal subset of iOS/macOS, narrowing the attack surface and leveraging existing security features like Code Signing and sandboxing. For developers, this overhaul exposes new capabilities through an enhanced App Intents framework and a new, Swift-centric "Foundation Models" framework. These tools will allow apps to integrate more deeply with Siri, providing contextual awareness of on-screen content and enabling more complex, multi-step actions through natural language commands. The system's models are designed to be adaptable through pluggable, task-specific LoRA (Low-Rank Adaptation) adapters. This allows the base models to be specialized for different functions—like summarization or code generation—by swapping small, efficient adapter modules on the fly without altering the foundational weights. Developers will be able to train their own adapters to specialize the on-device model for their apps. This entire effort traces back to an internal framework codenamed "Ajax," which was built to unify machine learning development across the company. Reports confirm that Ajax was built on top of Google's JAX, a framework for high-performance numerical computing and machine learning research, and runs on Google Cloud for development.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.