Local AI and agent infrastructure

Google positioned Gemma 4 as a model family built for local and mobile deployment, and Cloudflare expanded its Agent Cloud to offer deployment, security and orchestration tools for fleets of agents. Together those announcements shift the conversation from model capability alone to where and how agents are hosted — on-device, on-site, or in cloud-managed agent platforms. The two vendor moves underline an emerging split between local inference for latency and centralised control planes for policy and orchestration. (cloudwars.com) (siliconangle.com) (morningstar.com)

Google and Cloudflare used the past two weeks to answer a different artificial intelligence question: not just how smart a model is, but where an agent should live. (blog.google) (cloudflare.com) On April 2, Google introduced Gemma 4, an open model family it said was built to run on developers’ own hardware, from Android phones and laptops to workstations. Google released four sizes — Effective 2B, Effective 4B, 26B Mixture of Experts, and 31B Dense — under an Apache 2.0 license. (blog.google) Google’s pitch was local execution: Gemma 4 supports more than 140 languages, multimodal input, and agent-style tasks such as multi-step planning and offline code generation. In Android’s AICore Developer Preview, Google said the E2B model runs three times faster than E4B, while the new on-device model is up to four times faster than previous versions and uses up to 60% less battery. (developers.googleblog.com) (android-developers.googleblog.com) An agent is software that does work in steps instead of answering once, like a clerk who can read, decide, and act. Running that agent on a phone or laptop can cut delay and keep data on the device, while running it in a managed cloud can make it easier to supervise, update, and secure. (developers.googleblog.com) (cloudflare.com) Cloudflare made the opposite side of that case on April 13, when it said it was expanding Agent Cloud with infrastructure, security, and developer tools for production workloads on its network. The company said the package is aimed at moving agents from laptop demos to “millions of autonomous, long-running agents” running across Cloudflare’s global platform. (cloudflare.com) Cloudflare’s argument is that agents do not fit the older cloud pattern where a small number of applications serve many users. In a post opening its Agents Week on April 12, the company said agents are “one-to-one” workloads that need their own execution environment, persistent state, and room to call tools dynamically. (blog.cloudflare.com) That design shows up in the products Cloudflare is shipping around Agent Cloud. On April 13, Cloudflare said Sandboxes and Cloudflare Containers were generally available, with features including secure credential injection, persistent code interpreters, snapshots, background processes, and Active CPU Pricing for fleets of agents. (blog.cloudflare.com) Google is not abandoning the cloud in this model. Its Google Cloud blog said Gemma 4 can also be self-deployed inside a customer’s Google Cloud environment, with variants ranging from edge-focused 2B models to a 31B model for more complex orchestration. (cloud.google.com) The split is becoming clearer in product terms: local models handle fast, private inference near the user, while cloud platforms handle policy, identity, persistence, and scale for many agents at once. Google’s April 2 launch and Cloudflare’s April 13 expansion turned that architecture choice into the week’s real story. (developers.googleblog.com) (cloudflare.com)

Local AI and agent infrastructure

Get your own daily briefing