Serverless at scale: GCP Cloud Run for agents
An engineer published a production write‑up running autonomous agents on Cloud Run with event‑driven scaling to zero, per‑agent containers to limit blast radius, Firestore for agent communication, and infra‑level safety layers. The pattern shows serverless primitives can match bursty agent workloads while giving isolation and lower idle cost. (Moses Acosta on X)
An autonomous agent is just software that can take a goal, call tools, and keep making decisions until it finishes or fails. The hard part is not the “thinking” step; the hard part is giving that loop somewhere safe and cheap to run when traffic comes in bursts instead of a steady stream. (cloud.google.com) Most cloud systems were built for web apps that get a request, send a response, and go quiet. Agents behave more like pop-up workshops: one minute nothing is happening, and the next minute 500 jobs want browsers, code, and API calls at once. (cloud.google.com) That is why this Google Cloud Run pattern is getting attention. Cloud Run is Google’s managed container service, and by default it uses request-based billing so instances are charged when they start, handle requests, and shut down instead of sitting idle all day. (cloud.google.com) Cloud Run also scales down to zero when nothing is hitting the service, which is the serverless part people care about here. Google says you can keep warm instances with a minimum setting, but the default behavior is to scale in based on incoming requests, which is what makes bursty workloads affordable. (cloud.google.com) The production write-up described giving each agent its own container instead of packing many agents into one long-lived worker. That changes failure handling from “one bad tenant can poison the whole process” to “one container crashes and only that agent’s job is affected,” which is the same basic idea as giving every experiment its own lab bench. (x.com) Cloud Run lets you push that isolation even further by setting concurrency, which is the number of requests one container handles at the same time. Google’s docs note that if you need strict per-request isolation, you can set concurrency to 1 so one instance processes one request at a time. (cloud.google.com) The agents in this setup used Cloud Firestore as a shared notebook. Firestore is Google’s document database, and it supports real-time listeners that fire when a document changes, so one agent can write state and another service can react without constant polling. (firebase.google.com) That matters because autonomous systems spend a lot of time waiting on outside events. If a browser step finishes, a tool returns data, or a human approves an action, Firestore can act like the handoff table where the next worker sees the updated card and picks it up. (firebase.google.com) The safety layer in this design sits below the agent code instead of inside the prompt alone. Cloud Run containers run in a sandboxed environment, and Google documents gVisor as the container runtime sandbox used in the first-generation execution environment, which is meant to reduce the blast radius of risky code paths. (cloud.google.com, gvisor.dev) There are still limits. Cloud Run requests have timeouts, with a default of 300 seconds and a maximum of 60 minutes, so an agent that wants to stay alive for hours needs to break work into resumable steps instead of pretending one request can do everything. (cloud.google.com, discuss.google.dev) The bigger idea is that agent infrastructure may end up looking less like one giant “AI platform” and more like old-fashioned cloud primitives used carefully. Containers give isolation, Firestore gives coordination, and serverless scaling gives a way to pay for spikes without paying for silence. (cloud.google.com, cloud.google.com, firebase.google.com)