Agents: governance + quantization

Two recent YouTube pieces captured the twin enterprise concerns about AI agents: a satirical video highlighted risks when agents act without human oversight, while a technical video focused on Google's new quantization techniques that cut model serving costs. Together they underscore conversations about permissions, auditability and whether compressed models can make agent deployments economically feasible at scale. (youtube.com 1) (youtube.com 2)

AI agents are moving from chatbot demos to software that can read, decide and act inside business systems, and that is forcing two questions at once: who approves the action, and how much the action costs to run. (mitsloan.mit.edu) An agent differs from a chatbot because it does more than answer a prompt. MIT Sloan said agents are designed to integrate with other software systems and complete multistep tasks independently or with minimal human supervision. (mitsloan.mit.edu) That autonomy turns identity and permissions into a core design problem. In a November 19, 2025 discussion, Ory’s Jeffrey Hickman said companies need to decide “who can act, on whose behalf, and under what circumstances” before an agent can trigger payments, move data or change production systems. (em360tech.com) Governance groups are starting to write those rules down. Singapore’s Infocomm Media Development Authority published Version 1.0 of its Model AI Governance Framework for Agentic AI on January 22, 2026, and said the guidance is aimed at organizations building agents in-house or buying third-party agent systems. (imda.gov.sg) The framework says humans remain ultimately accountable for agent behavior. It focuses on risks created by agents’ access to sensitive data and their ability to make changes to their environment, including more unpredictable outcomes from interactions among multiple agents. (imda.gov.sg) The United States already has a broader template for this kind of control. The National Institute of Standards and Technology Artificial Intelligence Risk Management Framework, released in January 2023, organizes risk work under four functions: govern, map, measure and manage. (nist.gov) Security groups are also shifting from model safety to agent safety. The Open Worldwide Application Security Project published its Top 10 for Agentic Applications in December 2025, describing autonomous, tool-using systems as a distinct security category that needs its own threat models and mitigations. (genai.owasp.org) The second pressure point is cost. Google Research said on March 24, 2026 that large language model serving is often bottlenecked by the key-value cache, a fast-access memory store that keeps the model’s recent working context ready during generation. (research.google) Google’s new TurboQuant method compresses those high-dimensional vectors so they use less memory without changing model quality in its tests. In the accompanying paper, the researchers reported “absolute quality neutrality” for key-value cache quantization at 3.5 bits per channel and only marginal degradation at 2.5 bits per channel. (research.google) (arxiv.org) Google said TurboQuant will be presented at the International Conference on Learning Representations in 2026, and said the same compression approach can also speed vector search systems used in retrieval and search engines. That matters for agents because many of them rely on both long context windows and retrieval systems to decide what to do next. (research.google) (arxiv.org) The enterprise pitch for agents now rests on both tracks at once. A company can give an agent more tools and more memory, but it still has to show who set the permissions, who can audit the action log, and whether the compressed model is cheap enough to run at production scale. (imda.gov.sg) (research.google)

Agents: governance + quantization

Get your own daily briefing