Turn agents into backend services

- OpenAI, LangChain, Temporal, and Inngest are all now pushing the same idea: agents should run as durable workflows, not disposable chat sessions. - The concrete pattern is strikingly consistent — persisted state, checkpointed steps, idempotent tool calls, replay after failure, and trace-level observability. - That shifts agent building away from prompt craft alone and toward backend engineering — orchestration, storage, retries, approvals, and recovery.

AI agents are starting to look a lot less like chatbots and a lot more like backend jobs with a language model attached. That’s the real shift here. Over the past few months, the major agent stacks have converged on the same architecture: keep state, checkpoint progress, resume after failure, and treat tool calls like production workflow steps rather than one-shot prompt tricks. OpenAI’s current Agents docs say the app should own orchestration, tool execution, approvals, and state, while LangGraph and Temporal-style systems are leaning hard into durable execution and replay. ### Why are people saying “backend service” now? Because the failure mode changed. A chatbot can fail and you just ask again. An agent doing real work can’t. If the run has already searched documents, called three APIs, written a draft, and paused for approval, losing that state is not a minor annoyance — it means duplicate side effects, wasted tokens, and broken user trust. That’s why the newer agent guidance keeps sounding like workflow engineering. (developers.openai.com) ### What does “durable” actually mean? Basically, the system saves progress at meaningful boundaries and can resume from there later. LangGraph describes durable execution as saving workflow progress at key points so it can pause and resume exactly where it left off, even after delays or failures. Temporal’s pitch is the same idea in backend language — durably record each workflow step, then replay from the point of failure without redoing completed work. (docs.langchain.com) ### Why isn’t chat history enough? Because memory is doing two different jobs. One is short-term working state — what happened in this run, what tool returned what, what branch the agent is on. The other is long-term memory — user facts, preferences, prior outcomes, application data. LangGraph now spells that split out directly, with thread-level short-term memory and separate long-term memory across sessions. That’s a backend data model, not just a prompt window. (docs.langchain.com) ### Why do idempotent tools matter so much? Because replay is dangerous if side effects repeat. If an agent resends an email, recharges a card, or rewrites a record every time a workflow resumes, the recovery system becomes the bug. LangGraph’s durable execution docs explicitly tell developers to make workflows deterministic and idempotent and to wrap side-effecting operations inside tasks so resumed runs retrieve stored results instead of repeating the action. That is classic distributed-systems discipline showing up inside agent design. (docs.langchain.com) ### Where does orchestration fit in? Right in the middle. OpenAI’s current docs frame agents as applications that plan, call tools, collaborate across specialists, and keep enough state to finish multi-step work. Their orchestration guide then splits the design into handoffs and agents-as-tools — basically, who owns the workflow and when a specialist is just a bounded helper. That’s not a UI pattern. It’s control-flow design. (docs.langchain.com) ### Why is observability suddenly a first-class feature? Because agent failures often don’t look like crashes. The system can run for 200 steps and still be wrong. LangGraph’s overview now puts tracing and runtime visibility next to durable execution and memory as core benefits. Once you have branching runs, tool retries, human approvals, and multiple specialists, traces become the only sane way to debug what happened. (developers.openai.com) ### Is this just one vendor’s framing? No — that’s the notable part. OpenAI is talking about orchestration and state. LangChain is talking about persistence, checkpointers, and traceability. Temporal and Inngest are explicitly arguing that agent reliability is a durable-execution problem, not just a model-quality problem. When different stacks converge on the same constraints, that usually means the architecture is real. (docs.langchain.com) ### So what changes for builders? The center of gravity moves. Prompting still matters, but the hard part increasingly lives in state machines, retries, approval boundaries, storage, and side-effect control. The agent is becoming the decision layer on top of a workflow runtime. Or more bluntly — if you want an agent to do real work, you now have to build it like a backend service. (openai.com) (developers.openai.com)

Turn agents into backend services

Get your own daily briefing