Guard models & compliance
- Companies are discussing 'guard models' as app‑level firewalls to meet EU AI Act and NIST guidance requirements. (x.com) - OpenAI and others are pushing compliance measures, reporting rules and stronger insider controls in recent posts. (x.com) - Experts are urging shared responsibility between platforms, vendors and app developers to manage governance and secrecy risks. (x.com)
A “guard model” is turning into a new layer in enterprise AI stacks: a smaller system that sits between a user and a chatbot, screening prompts and outputs before they pass through. (arxiv.org) That extra layer is being discussed as a way to meet rules that do not stop at the base model. The European Commission says obligations for general-purpose AI models began applying on August 2, 2025, and its guidance points to documentation, evaluation, adversarial testing and risk mitigation duties for the most capable systems. (digital-strategy.ec.europa.eu) The U.S. government’s main voluntary framework points in the same direction. The National Institute of Standards and Technology’s Artificial Intelligence Risk Management Framework tells organizations to “govern, map, measure, and manage” risks, with clear roles, training and oversight for staff and partners. (nist.gov) In practice, a guard model works like an application firewall for AI. It can block prompt injection, flag jailbreak attempts, stop sensitive data from leaving a system, and route risky requests for review before the main model answers. (arxiv.org) That approach fits the way regulators are dividing responsibility. The Commission’s guidance says only actors making “significant modifications” to a general-purpose model take on provider obligations, while app builders and deployers still have their own compliance work around how a model is used. (digital-strategy.ec.europa.eu) OpenAI has been arguing for layered controls rather than a single safety switch at the model level. Its public safety materials describe moderation models, monitoring for abuse, deployment reviews and board-level oversight for critical safety and security measures. (openai.com) OpenAI’s current usage policies also frame safety as a shared job between the company and the people building on its tools. The policy says “responsible use is a shared priority,” a line that matches the wider industry push to split duties among platform providers, vendors and application teams. (openai.com) The insider-risk piece is getting more attention because compliance is not only about outside attacks. NIST’s playbook calls for documented responsibilities, training, and agreements for personnel and partners, which is the kind of internal control companies use to limit who can access models, weights, logs and sensitive prompts. (nist.gov) The European Union’s rules also raise the stakes for the biggest model makers. Commission guidance says providers of general-purpose AI models with systemic risk must do model evaluations with standardized protocols and state-of-the-art tools, including adversarial testing to identify and mitigate systemic risks. (digital-strategy.ec.europa.eu) Researchers have been building the technical pieces for that compliance layer in public. Recent papers describe “guardrail” systems that watch for unsafe content, manipulation attacks and data leakage, giving companies a way to turn policy documents into software checks that run on every request. (arxiv.org) The result is a quieter shift in how AI products are being built. Instead of treating one frontier model as the whole system, companies are adding a second model to police the first — and using that separation to show regulators, auditors and customers where the controls actually sit. (digital-strategy.ec.europa.eu)