Gemini Interactions API adds multi‑step agent workflows to enable persistent agentic tasks

- Google is now pushing the Gemini Interactions API as the main way to build agentic apps, with server-side state and background execution in beta. - The key shift is architectural: one endpoint can drive plain model calls, tool use, and managed agents like Deep Research. - That matters because Gemini is moving from chat completions toward long-running software agents developers can actually orchestrate.

Google is turning Gemini into more than a prompt-in, answer-out API. The Interactions API — now positioned as the standard interface for new Gemini projects — is built for agents that keep state, call tools, and continue working in the background instead of dying after one response. That is the real story here. Not just “Google added another endpoint,” but that its developer stack is being reorganized around long-running agent workflows rather than classic chat turns. ### What is this API actually for? The Interactions API is Google’s newer Gemini interface for multi-turn, multimodal, agent-style work. It still handles simple one-shot requests, but the docs now frame it as the default primitive for new builds because it can manage server-side state, complex conversations, and tool-driven flows without the developer stitching everything together by hand. (ai.google.dev) ### Why is server-side state such a big deal? Because stateless APIs make every “agent” feel fake. If a system has to resend the whole conversation, the current plan, and every tool result on each turn, orchestration gets brittle fast. Server-side state means Google keeps the interaction context on its side, so an app can resume work, branch tasks, and manage longer sessions more like software processes than like repeated chat messages. (ai.google.dev) ### What changed in practice? The important change is that Gemini’s interface is no longer centered on a rigid prompt transcript. It is centered on interactions — units that can include user input, model reasoning steps, tool calls, tool results, and follow-up actions. Google’s migration guide makes the point pretty clearly: developers moving off `generateContent` are being steered toward an API optimized for agentic workflows, not just text generation. (blog.google) ### Why do tool calls matter more than prompts here? Because tools are how an agent does real work. A prompt can describe a job, but a tool lets the model search, fetch, compute, write, or hand off to another system. Google’s tooling docs and agents overview both lean on this idea — agents are systems that plan, execute actions, interact with external systems, and synthesize results. In other words, the model stops being just a talker and starts acting like an orchestrator. (ai.google.dev) ### Is this only for custom agents? No — and that is one of the more revealing parts. Google says the same API can talk to raw Gemini models and to managed agents, including Gemini Deep Research. That means Google is collapsing “call a model” and “invoke an agent product” into one surface. For platform teams, that is convenient. But it also means product boundaries are getting blurrier — your app may choose between building the workflow itself or delegating more of it to Google’s managed agent layer. (ai.google.dev) ### Where does background execution fit in? This is the piece that makes “persistent tasks” real. Google describes background execution as part of the Interactions API design, which means an agent can keep working after the immediate request cycle ends. That is useful for research, multi-step retrieval, and any workflow where the answer is not available in one pass. Basically, Gemini is being shaped for jobs that look more like queued work than chat replies. (blog.google) ### What does this force developers to decide? Once the model can act across tools and sessions, governance becomes product design. Teams have to decide which tools the agent can touch, how identities persist across sessions, what gets logged, when humans approve actions, and whether the agent is merely suggesting steps or actually executing them. Google’s docs emphasize tools, agents, and managed workflows, but the operational burden — permissions, audits, failure handling — lands on whoever ships the app. (blog.google) ### So what is the bottom line? Google is making a bet that the next important API primitive is not the chat completion but the interaction loop. If that sticks, the winning developer platforms will not just wrap a model with a prompt. They will manage sessions, tools, approvals, and long-running tasks as first-class product features. (ai.google.dev 1) (ai.google.dev 2)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.