OpenAI Privacy Filter

- OpenAI published a small model that detects and removes personal data from text to support on‑device filtering. - The 1.5B‑parameter model is open‑sourced under Apache 2.0 and designed to run locally for PII redaction. - It enables local sanitisation before data hits larger systems, lowering privacy risk in enterprise workflows. (news.bloomberglaw.com)

OpenAI has released a small open-source model that strips personal data from text on a laptop or in a browser before it is sent elsewhere. (openai.com) The model, called Privacy Filter, is available under the Apache 2.0 license on GitHub and Hugging Face, and OpenAI said it is intended for experimentation, customization, and commercial deployment. OpenAI published the release on April 22, 2026, and Bloomberg Law reported the launch on April 23. (openai.com) (github.com) (news.bloomberglaw.com) Privacy Filter is built to find personally identifiable information, or PII, in messy text such as names, bank account numbers, addresses, and other details tied to a private person. OpenAI said the model can make those calls in one pass over long documents instead of splitting files into smaller chunks first. (openai.com) (cdn.openai.com) OpenAI says the model has 1.5 billion total parameters, with 50 million active parameters, and a 128,000-token context window. The company says that size lets it run locally while still handling long records used in privacy reviews, training-data cleanup, and enterprise document workflows. (github.com) (huggingface.co) Running locally changes where the risky step happens. Instead of sending raw text to a remote service for de-identification, a company can mask sensitive details on-device and pass along only the cleaned version. (openai.com) That setup fits a problem that has grown with wider use of large language models: businesses want to summarize contracts, support tickets, medical notes, and internal records, but those files often contain data that privacy laws treat as sensitive. Bloomberg Law said the release comes as regulators keep pressing AI companies on how they collect and handle personal information. (news.bloomberglaw.com) OpenAI said it already uses a fine-tuned version of Privacy Filter in its own privacy-preserving workflows. The company also said the model can be fine-tuned for different data distributions and privacy policies, which matters for sectors that redact different fields or use different thresholds for false positives. (openai.com) (github.com) The technical design is closer to a labeler than a chatbot. OpenAI’s model card describes it as a bidirectional token-classification model with span decoding, meaning it reads text in context and marks the exact words that should be masked. (cdn.openai.com) OpenAI also says the model can distinguish between information that should stay because it is public and information that should be hidden because it belongs to a private individual. That is a harder task than simple pattern matching, which can catch phone-number formats but miss context around public figures, aliases, or indirect identifiers. (openai.com) (cdn.openai.com) The release adds another open-weight project to OpenAI’s public GitHub at a time when companies are looking for smaller models that can do one job well without moving sensitive data off-device. In this case, the job is narrow and concrete: clean the text first, then let bigger systems touch what remains. (github.com 1) (github.com 2)

OpenAI Privacy Filter

Get your own daily briefing