Qualcomm and DeepMind push edge AI

- Qualcomm EVP Durga Malladi used EDGE AI San Diego 2026 to argue AI is shifting onto phones, cars, wearables, and industrial devices. - DeepMind’s Chintan Parikh and Weiyi Wang paired that with tooling news — Gemma 4, LiteRT, and on-device agent workflows. - The real fight is no longer just bigger models. It is memory traffic, thermal limits, and software that matches silicon.

Edge AI is the part of the AI story where the model runs on the device in your hand, on the camera in a factory, or inside a car — not only in a datacenter. That matters because the economics change fast when inference moves closer to the user. Latency drops. Privacy improves. Network costs shrink. This week’s useful signal was that Qualcomm and Google DeepMind were pushing the same idea from different ends — Qualcomm from silicon and systems, DeepMind from models and runtime tooling. (youtube.com) ### Why are people suddenly talking about edge AI? Because the old assumption — that the smartest model has to live in the cloud — is breaking down. At EDGE AI San Diego 2026, Qualcomm’s Durga Malladi framed the shift as a move from cloud-only AI toward hybrid edge-plus-cloud systems, with AI spreading across phones, PCs, wearables, ve(youtube.com) similar: Gemma 4 is pitched explicitly for agentic workflows that can run on-device, not just in a browser tab talking to a server. (youtube.com) ### What changed to make that plausible? The models got smaller and the runtimes got better. Malladi highlighted roughly 10x improvements in model efficiency as one reason edge deployment is becoming practical. Google’s AI Edge stack has been moving the same direction — LiteRT replaces the older TensorFlow Lite branding and adds better(youtube.com)er cross-platform generative AI deployment for models like Gemma. Basically, the software layer is catching up to the hardware. (youtube.com) ### Why is memory the real bottleneck? Because moving data is often more expensive than doing the math. Qualcomm has been unusually direct about this in its own edge-LLM work: the company says memory bandwidth had to be reduced with tricks like distillation, quantization-aware training, and speculative decoding to make on-device langua(youtube.com)point in broader language — edge performance is defined by memory, compute, and transport together, not raw TOPS on a slide. Think of it like a kitchen where the chef is fast but the ingredients keep arriving one at a time. (qualcomm.com) ### So where does Qualcomm fit? Qualcomm is trying to be the full-stack edge AI supplier. AI Hub lets developers convert models into multiple on-device runtimes, quantize and fine-tune them, and validate performance on real Qualcomm hardware. That matters because edge AI dies in deployment if the m(qualcomm.com)e can help you get a model onto an actual device, in an actual power envelope, with an actual runtime.” (aihub.qualcomm.com) ### And where does DeepMind fit? DeepMind is pushing the model-and-tooling side. Gemma 4 is open under Apache 2.0 and pitched for multi-step, autonomous workflows on local hardware, with support for visual processing and more than 140 languages. Google’s AI Edge team is also showing how LiteRT, NPU acceleration, and ahead-of-time compilation ca(aihub.qualcomm.com) That is the less glamorous part of edge AI, but turns out it is the part that decides whether users keep the feature turned on. (developers.googleblog.com) ### Does this kill the cloud story? No — it changes the split. Qualcomm is explicitly talking about hybrid inference and real-time routing between edge and cloud, not a total replacement. The likely pattern is simple: training and giant workloads stay centralized, whil(developers.googleblog.com)the cloud becomes backup, coordinator, and heavy lifter. (youtube.com) ### Why should product teams care? Because this stops being a pure research problem the second you ship. On-device AI forces product, silicon, and platform teams to care about the same scorecard — latency, battery drain, thermal headroom, model size, memory traffic, and fallback behavior when the network disappears. The catch is that n(youtube.com) is a co-design problem now. (youtube.com) ### Bottom line? The loudest edge-AI claim is that value is leaving the datacenter. That is too simple. The more useful version is that value is being redistributed. The winners will be the companies that can make models, runtimes, and silicon behave like one system — and Qualcomm and DeepMind are both trying to define that stack from opposite directions. (youtube.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.