Model Supply-Chain Risk & Tooling
- Researchers warned that malicious GGUF-formatted model files can trigger remote code execution against SGLang model servers. - GitHub updated CodeQL to support custom sanitizers and validators in models-as-data, enabling bespoke scanning rules. - Together, the alerts and CodeQL improvements highlight that model formats and serving layers are active attack surfaces needing custom detection ( ).
A model file is supposed to be weights and metadata, but researchers said one GGUF file could execute code on an SGLang server when the server handled a reranking request. (labs.cloudsecurityalliance.org) SGLang is a framework for serving large language models, and its documentation says it is built for low-latency, high-throughput inference from one graphics processing unit to distributed clusters. GGUF is a single-file model format designed for easy distribution and loading, with room for extra metadata. (docs.sglang.io) (github.com) The flaw is tracked as CVE-2026-5760 and scored 9.8 out of 10 under the Common Vulnerability Scoring System, according to the Cloud Security Alliance research note published April 21. The note said a malicious `tokenizer.chat_template` inside a GGUF file could trigger a server-side template injection in SGLang. (labs.cloudsecurityalliance.org) The reported attack path starts with a poisoned GGUF model, then reaches SGLang’s `/v1/rerank` endpoint, where the server renders the embedded template with Jinja2 and runs attacker-controlled Python code. The Hacker News reported the trigger involved a Qwen3 reranker phrase that activated the vulnerable code path in `serving_rerank.py`. (labs.cloudsecurityalliance.org) (thehackernews.com) That turns the model supply chain into part of the attack surface. A team can download a file from a public repository, treat it like data, and still end up exposing the machine that serves the model. (labs.cloudsecurityalliance.org) (github.com) GitHub moved on a related problem on April 21, saying CodeQL now supports custom sanitizers and validators in models-as-data across C and C++, C#, Go, Java and Kotlin, JavaScript and TypeScript, Python, Ruby, and Rust. CodeQL is GitHub’s static analysis engine for code scanning. (github.blog) (codeql.github.com) In practice, that update lets security teams describe their own “this makes data safe” and “this checks data first” rules with data extensions instead of hard-coding every pattern into queries. GitHub said the feature works in models-as-data, the declarative layer used to teach CodeQL about project-specific frameworks and flows. (github.blog) (codeql.github.com) The timing ties two parts of the same problem together: model-serving stacks now include parsers, template engines, web endpoints, and file formats that scanners may not understand out of the box. Custom rules matter when the dangerous input is not a form field or query string, but a model artifact loaded deep inside an artificial intelligence pipeline. (labs.cloudsecurityalliance.org) (github.blog) Researchers compared CVE-2026-5760 to “Llama Drama,” the 2024 flaw in `llama_cpp_python` that also turned model metadata into code execution. The pattern is becoming familiar: the file that looks like content can carry instructions a serving layer interprets too eagerly. (thehackernews.com) (labs.cloudsecurityalliance.org) The immediate lesson is narrower than “don’t download models” and broader than one SGLang bug. Treat model files, conversion tools, and inference servers as software supply-chain components, because attackers increasingly do. (labs.cloudsecurityalliance.org) (github.blog)