Apple models vulnerable to prompt injection

- RSAC researchers said on April 8-9 they bypassed Apple Intelligence guardrails, steering Apple’s on-device model into attacker-chosen outputs through prompt injection. - The attack worked in 76 of 100 prompt tests and paired a “Neural Exec” jailbreak with Unicode right-to-left text obfuscation. - Apple says it hardened safeguards, but the result dents the idea that local inference alone solves AI security.

Apple’s on-device AI was supposed to have a built-in advantage. The model lives inside the operating system, not on some random third-party server, and apps reach it through Apple’s own Foundation Models API. That sounds safer. But this week RSAC researchers showed that “local” does not mean “unhackable.” They found a way to steer Apple Intelligence into producing attacker-directed outputs by chaining prompt injection with a Unicode text trick, and Apple has since tightened the guardrails. (rsaconference.com) ### What actually got broken? The target was Apple Intelligence’s on-device language model — the small local model Apple exposes to apps through system APIs. RSAC’s point was not that they stole the model or cracked open its weights. The point was that they could make the mo(rsaconference.com) but the behavior inside it got redirected. (rsaconference.com) ### How did the attack work? It was a two-part trick. First came “Neural Exec,” a prompt-injection technique meant to override the model’s intended instructions and replace them with the attacker’s. Second came a Unicode RIGHT-TO-LEFT OVERRIDE character, which let the resear(rsaconference.com)ext, while a human sees the visually reordered version. Stack the two together and you get both control and camouflage. (9to5mac.com) ### Why is Unicode the sneaky part? Because filters and humans do not always read the same thing. Think of it like writing a message in a mirror and then hanging the mirror in front of the guard. The person sees the readable sentence. The scanner may see the reversed underlying text. RSAC us(9to5mac.com)e model returned it. (9to5mac.com) ### How well did it work? Well enough to matter. RSAC says it tested the attack on 100 random prompts and succeeded 76% of the time. The researchers also said that before Apple’s fixes, they estimated between 100,000 and 1 million Apple customers were already using apps vulnerable to this ki(9to5mac.com)025, which is why this was worth trying in the first place. (rsaconference.com) ### What could an attacker do with that? The scary version is not “make the bot swear.” It is “make the bot misuse data or tools inside an app.” RSAC framed the threat around apps that feed outside content — emails, PDFs, web pages — into the model. If a malicious document c(rsaconference.com) including health and fitness data and family videos. (rsaconference.com) ### Didn’t Apple already patch it? Basically, yes. Apple has since hardened its safeguards, and Apple’s own developer docs show updated Foundation Models guardrails tied to newer OS releases, including iOS 26.4, iPadOS 26.4, macOS 26.4, and visionOS 26.4. But the bigger less(rsaconference.com)s, app permissions, and cached credentials. (9to5mac.com) ### Why does this matter beyond Apple? Because Apple picked the privacy-first architecture everyone points to as the safer path. And even there, prompt injection still landed. That lines up with a broader security reality: indirect prompt injection is becoming a platform problem, not just a (9to5mac.com)urface stops being “the model” and starts being the whole workflow around the model. (rsaconference.com) ### Bottom line The news here is not that Apple Intelligence is uniquely broken. It is that one of the most locked-down consumer AI stacks still got pushed off course by crafted text. On-device inference helps with privacy. It does not erase prompt injection. The hard part now is building systems that assume hostile text will get in — and still fail safely. (rsaconference.com)

Apple models vulnerable to prompt injection

Get your own daily briefing