Gemini distilled on iPhone
Engineers demonstrated distilling larger models like Gemini to run locally on iPhone hardware, showing roughly 30 tokens/sec on an iPhone 15 Pro in an example. The result highlights the tradeoffs driving on‑device AI work: model size, latency, and hardware capability. (x.com)
A March 25 report in The Information says Apple was granted “complete access” to Google’s Gemini inside Google data centers, which gives Apple rights beyond simple API calls. (theinformation.com) The same reporting says Apple can run distillation inside its own infrastructure and use Gemini’s internal reasoning traces — including chain‑of‑thought outputs — as training signals to produce much smaller “student” models. (mactech.com) Apple already ships a ~3‑billion‑parameter on‑device foundation model and a larger server model for Private Cloud Compute, creating an existing two‑tier architecture that distilled Gemini student models could slot into. (machinelearning.apple.com) Apple confirmed WWDC will run June 8–12, 2026, a keynote window industry outlets and Apple‑adjacent reporting identify as the likely moment to reveal Siri and Apple Intelligence features that use Gemini-derived models. (apple.com) Public technical workflows and vendor examples show the practical path Apple would follow: generate high‑quality labels or reasoning traces from a large teacher model and supervise‑train smaller student networks to approximate its behavior for constrained hardware. (labelbox.com) Apple’s own technical documents describe on‑device model optimizations—KV‑cache sharing, quantization‑aware training and other architectural tweaks—that mirror the kinds of efficiency engineering required to make distilled Gemini students viable on iPhone silicon, while heavier tasks remain targeted to cloud/Private Cloud Compute. (arxiv.org)