Apple Intelligence Bypassed
Researchers found a prompt‑injection technique that tricked Apple Intelligence on iPhones, iPads, Macs and Vision Pro into executing attacker‑controlled behavior, showing on‑device models can still be steered by malicious content. The reports argue millions of users could be exposed and that the flaw underscores the need for strict trust boundaries, provenance-aware context assembly, and app‑level sandboxing around model invocations ( ).
A language model is basically autocomplete with a much bigger memory: you give it text, and it predicts the next words by following instructions hidden in the prompt it receives. Apple built one of these models into Apple Intelligence so features in Mail, Messages, Notes, Photos, Safari, and Siri can rewrite, summarize, and act on text on the device itself. (machinelearning.apple.com) That “prompt” is not just what you type. It also includes invisible instructions from Apple and, in many cases, extra text pulled in from the app that called the model, which means the model is reading a bundle of context, not a single sentence. (machinelearning.apple.com) Prompt injection is what happens when an attacker hides new instructions inside that bundle and gets the model to follow them instead of the original rules. It is less like hacking the operating system and more like slipping a fake note into a stack of real paperwork so the assistant obeys the wrong boss. (theregister.com) Researchers at RSAC said they found a way to do exactly that against Apple Intelligence on supported iPhones, iPads, Macs, and Apple Vision Pro. Their attack forced the on-device model to produce attacker-controlled output even though Apple had input filters, output filters, and model-level guardrails in place. (9to5mac.com) The first piece of the trick was a text-direction control character called Unicode RIGHT-TO-LEFT OVERRIDE. The researchers wrote a harmful string backwards so filters inspecting the raw text saw gibberish, while the screen still showed the text in the intended order. (9to5mac.com) The second piece was a technique called Neural Exec. RSAC researcher Dario Pasquini developed it to use machine learning to search for instruction patterns that make a model drop its original rules and follow a new command instead. (theregister.com) Put together, the Unicode trick helped the payload get past Apple’s filters, and Neural Exec helped the payload steer the model once it was inside. In RSAC’s test with 100 random prompts, the attack worked 76 times. (theregister.com) The scale here comes from where Apple put the model. The Register reported RSAC’s estimate of at least 200 million Apple Intelligence-capable devices in use by December 2025, plus as many as 1 million App Store apps that could call into the system. (theregister.com) Apple was told about the issue on October 15, 2025, and RSAC says protections added in iOS 26.4 and macOS 26.4 block the specific attack they built. 9to5Mac also reported that Apple has since hardened its safeguards. (theregister.com, 9to5mac.com) The awkward lesson is that “on-device” and “private” are not the same thing as “unsteerable.” Apple’s own research says the on-device model is designed for low-latency use inside apps and is also exposed to developers through the Foundation Models framework, which makes trust boundaries around app-supplied context a security problem, not just a product design detail. (machinelearning.apple.com) That is why this story is not really about a model saying rude words. It is about a system that reads mixed text from users, apps, and hidden instructions, and the RSAC result shows that if those sources are not kept separate with strict provenance and sandboxing, the model can be talked into serving the attacker instead of the user. (theregister.com, 9to5mac.com)