Two new agent‑tooling papers
A new paper called 'Act Wisely' shows multimodal agents that introspect before calling tools—checking whether a tool call is necessary and predicting outcomes—while a Stanford/Harvard paper on Agent‑Supervised Tool Adaptation proposes freezing the core model and training tools or sub‑agents for a large data‑efficiency win in production. Both works focus on reducing data costs and improving agent tool use without changing base architectures. (x.com/KryptonAi/status/2042934874881843626, x.com/AlphaSignalAI/status/2042619044625355059)
An artificial intelligence agent is a model that can call search engines, code runners, or other software while it works. Two new papers argue those agents should get better at deciding when to use a tool and that engineers can often improve the tool stack without retraining the core model. (arxiv.org, arxiv.org) One paper, “Act Wisely,” was posted to arXiv on April 9, 2026. It studies multimodal agents — systems that read images as well as text — and says current models often make “blind” tool calls even when the answer is already visible in the prompt. (arxiv.org) The authors of “Act Wisely” propose a training method called Hierarchical Decoupled Policy Optimization, or HDPO. Instead of mixing accuracy and tool-use penalties into one reward, the method keeps one channel for getting the task right and a second channel for using fewer tools only on trajectories that were already correct. (arxiv.org) The paper says that setup let its model, called Metis, cut tool invocations by orders of magnitude while also improving reasoning accuracy. The authors frame the gain as a latency and noise problem: fewer unnecessary calls can mean fewer delays and fewer chances for outside tools to derail the answer. (arxiv.org) The second paper is not a single new algorithm but a framework for where to spend training effort in production systems. “Adaptation of Agentic AI,” posted to arXiv in December 2025 and updated in version 2, splits the field into agent adaptation and tool adaptation, then breaks tool adaptation into “agent-agnostic” and “agent-supervised” forms. (arxiv.org, arxiv.org) In its “agent-supervised tool adaptation” category, the survey describes keeping the agent fixed and training the tools around it from the agent’s own outputs. The examples it gives include reward-driven retriever tuning, adaptive rerankers, search subagents, and memory-update modules. (arxiv.org) That matters for teams running agents on real workloads because retraining a foundation model is expensive, slow, and often operationally risky. The survey presents tool-side adaptation as a way to improve reliability, efficiency, and specialization after pretraining, using narrower components that are cheaper to swap or tune. (arxiv.org, arxiv.org) The two papers attack different parts of the same problem. “Act Wisely” focuses on the moment before a tool call, while the survey’s tool-adaptation lens focuses on making the called tool, retriever, memory system, or subagent better suited to the frozen model that uses it. (arxiv.org, arxiv.org) Both papers also fit a broader shift in agent research toward efficiency rather than raw capability alone. A January 2026 survey on efficient agents describes rising concern with costs such as latency, tokens, and step count in agent workflows, especially when systems loop through memory, planning, and tool use. (arxiv.org) The practical takeaway is narrow but clear: the next gains in agents may come less from changing the base model and more from teaching it to pause before reaching for a tool — and from upgrading the tools it reaches for. (arxiv.org, arxiv.org)