Thread: 9 non‑LLM ways agents fail
Maryam Miradi laid out nine non‑LLM failure causes for production agents — things like API rate limits, recursive collapse, and latency‑cost spirals — underscoring that model behavior is only one slice of reliability risk. (x.com)
Miradi’s public posts package the list as nine non‑LLM production failure causes and open the list with “Context Window Overflow,” explaining that multi‑turn pipelines can exhaust useful context well before API token limits are hit. (youtube.com) The thread calls out API rate limits and silent timeouts as root causes that repeatedly surface when agents assume unlimited third‑party call capacity, a failure mode other engineering writeups say leads to hidden retries and cascading errors. (youtube.com) Miradi names repeated tool calls and “tool storms” that produce latency‑to‑cost spirals; contemporary field guides recommend step budgets, max_rounds, and execution‑level cost monitoring to break that feedback loop. (youtube.com) A focal remediation in her materials is observability: she prescribes run‑level traces, schema‑validated tool contracts, and dead‑letter queues so every agent decision can be replayed and audited rather than inferred from final outputs alone. (maryammiradi.com) On platform design, Miradi presses for an “agent‑native” integration layer and reusable SDK primitives that centralize memory APIs, guarded tool calls, and policy enforcement — an approach mirrored by enterprise agent‑data‑plane vendors that expose governed connectors and full replayability. (maryammiradi.com) She repeatedly warns against premature fine‑tuning and all‑in rollouts, noting in her 56‑page field guide and recent posts that most production instability traces to orchestration and infra (routing, retrieval, memory) rather than the base model. (maryammiradi.com)