Google open‑sources AMS safety tool
- Google on April 27 open-sourced AMS, short for Activation-based Model Scanner, a tool that checks whether open-weight language models still retain safety training. - Google says AMS scans a model in 10 to 40 seconds without prompting it, using activation patterns to flag tampering or “uncensored” fine-tunes. - The release targets CI screening for open models as safety-modified repos proliferate. (opensource.googleblog.com)
Google has open-sourced AMS, a tool that checks whether an open-weight language model still has its safety training intact before deployment. (opensource.googleblog.com) AMS stands for Activation-based Model Scanner, and Google published it on April 27, 2026 through the Google Open Source Blog and a public GitHub repository under GoogleCloudPlatform. (opensource.googleblog.com) (github.com) Most model safety checks work like a stress test: send the model lots of harmful prompts and see whether it refuses. Google says AMS skips that step and instead inspects the model’s internal activation patterns, the numeric signals produced as it processes text. (opensource.googleblog.com) (deepmind.google) Google’s claim is that safety tuning leaves a measurable geometric pattern inside those activations. When a model has been stripped down by fine-tuning, “abliteration,” or training on unfiltered data, AMS looks for that pattern to collapse. (opensource.googleblog.com) (zenodo.org) The company says a scan takes 10 to 40 seconds on GPU hardware, fast enough to use in continuous integration pipelines or to screen large registries of downloaded models. The GitHub README also advertises JSON output for CI/CD workflows. (opensource.googleblog.com) (github.com) Google’s examples split scans into quick, standard, and full modes. Standard checks three concepts — harmful content, injection resistance, and refusal capability — while full mode adds truthfulness. (opensource.googleblog.com) (github.com) The supporting preprint, posted to Zenodo on April 10, says the validation covered 14 model configurations across Llama, Gemma, and Qwen families, plus FP16, INT8, and INT4 quantization. It reports instruction-tuned models showing 3.8 to 8.4 sigma separation on safety concepts, while several uncensored models fell to 1.1 to 1.3 sigma. (zenodo.org) (letsdatascience.com) Google is pitching AMS at a moment when open-model tampering has become easier to distribute. The company cites a 2025 study that found more than 8,000 safety-modified repositories on Hugging Face, with modified models complying with unsafe requests at 74% versus 19% for the original instruction-tuned versions. (opensource.googleblog.com) The repository says AMS is “not an officially supported Google product,” even though Google published it and hosts the code. That leaves it looking less like a product launch and more like a research tool that security and compliance teams can wire into release checks now. (github.com)