Fake repo claiming to host OpenAI Privacy Filter drew ~244k downloads

- Hugging Face users flagged a typosquat repo, Open-OSS/privacy-filter, that impersonated OpenAI’s new Privacy Filter model and bundled malware instead of clean helper files. - The fake page showed 244,168 downloads and 208 likes before Hugging Face staff confirmed the report; the payload chain ended in Boxter malware. - OpenAI had just released the real model on April 22, making a fast-moving, high-interest target for supply-chain impersonation.

Open-source model hubs just got a sharp reminder that the dangerous part is often not the model weights — it’s the little convenience files wrapped around them. Over the weekend, users spotted a fake Hugging Face repo named `Open-OSS/privacy-filter` that copied OpenAI’s newly released Privacy Filter page and added malicious scripts. The repo was impersonating `openai/privacy-filter`, the real model OpenAI released on April 22, 2026 for detecting and redacting personally identifiable information in text. ### What is Privacy Filter? Privacy Filter is an open-weight PII detection model — basically a tool for finding names, emails, phone numbers, addresses, and other sensitive text before that data gets indexed, logged, or reused. OpenAI says it is meant for high-throughput sanitization workflows, can run locally, and is small enough for a laptop or browser-class setup despite using 1.5B total parameters with 50M active parameters and a 128,000-token context window. (huggingface.co) ### Why was this model an easy target? Because it was new, useful, and had real momentum. OpenAI published the model under Apache 2.0 and put it on Hugging Face and GitHub for experimentation and commercial use. That kind of release creates a rush of developers cloning repos, testing demos, and wiring examples into internal tooling — exactly the moment when a lookalike page can catch people moving too fast. (openai.com) ### What did the fake repo actually do? The malicious repo appears to have copied the official model card almost verbatim, then slipped in two extra files: `loader.py` and `start.bat`. The posted analysis says `start.bat` ran `loader.py` before normal dependency setup, `loader.py` pulled a PowerShell command from `jsonkeeper.com`, and that command downloaded another batch file from `api.eth-fastscan.org`, which then delivered Boxter malware. That is the classic supply-chain trick — make the page look familiar, then hide the attack in setup glue code most users barely read. (openai.com) ### How many people touched it? The number that makes this story land is 244,168 downloads. The warning post on the official OpenAI model discussion page also listed 208 likes on the fake repo before it was taken down. “Downloads” on model hubs do not equal confirmed infections — a pull can be automated, partial, or harmless if nobody ran the scripts — but it does show how much exposure a convincing fake can rack up in a very short window. (huggingface.co) ### Was the repo removed? Yes — at least by the time Hugging Face staff responded on the official discussion thread. An OpenAI org maintainer replied that the impersonating repo “seems to be down now” and closed the thread. That matters because it confirms the warning was seen and acted on, not just circulating as rumor on social media. ### Why are helper files the real risk? Model weights on Hugging Face are often stored as `safetensors`, which are designed to avoid arbitrary code execution. (huggingface.co) But repos also carry notebooks, shell scripts, Python loaders, and Windows batch files. Those files can phone home, fetch second-stage payloads, or alter your environment before the model ever loads. In other words — the “downloaded a model” mental model is too narrow. You are often downloading a mini software package. ### So what should teams do differently? Treat model repos like third-party code, not like static assets. Pin exact sources. Verify publisher identity. Check hashes and file diffs. Avoid running bundled scripts you did not inspect. Prefer loading weights through standard libraries from the official namespace instead of clicking whatever helper launcher a repo provides. That advice was already good hygiene, but this incident makes it feel less theoretical. (huggingface.co) ### Bottom line? The fake Privacy Filter repo matters because it hit a real weak spot in the AI stack — developers trust familiar model pages more than they trust random executables, but a repo can quietly be both. Open model distribution is getting easier. That also means model impersonation is getting cheaper. (huggingface.co)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.