OpenAI Privacy Filter
- OpenAI open-sourced a small Privacy Filter model that masks sensitive information before text is sent to chatbots. - The filter automatically redacts personal data and is designed to run as a pre-inference privacy layer free to use. - Packaging redaction into runtime infrastructure shifts privacy from policy pages to the execution path and suggests composable pre-inference controls for consumer agents (decrypt.co)
OpenAI has released Privacy Filter, a small open-source model that strips personal data from text before it reaches a chatbot. (openai.com) The model is built to detect personally identifiable information such as names, email addresses, phone numbers, bank details, and other sensitive spans in unstructured text. OpenAI published it on GitHub and Hugging Face under an Apache 2.0 license, which allows commercial use and modification. (github.com) OpenAI says the model can run locally on a laptop or in a web browser, so raw text can be filtered on-device instead of being sent elsewhere first. The company describes it as a high-throughput tool for teams that need on-premises privacy filtering they can inspect and tune. (openai.com) Privacy filters work like a digital black marker: they scan text for sensitive details and replace those spans before another system reads them. OpenAI’s version is a token-classification model, meaning it labels pieces of text and then groups them into coherent redacted spans. (openai.com) OpenAI says the release is aimed at enterprise datasets, logs, support transcripts, and other text pipelines where companies want to reduce exposure before inference or storage. The model card says it is designed for “detection and redaction,” not as a guarantee that every sensitive detail will always be caught. (cdn.openai.com) The technical pitch is speed with context. OpenAI says Privacy Filter has 1.5 billion total parameters, with 50 million active at a time, and supports a 128,000-token context window for long documents without chunking. (huggingface.co) That design matters because older privacy tools often rely on pattern matching, like spotting strings that look like phone numbers or Social Security numbers. A context-aware model can make finer judgments about whether “Alice,” “Jordan,” or “Washington” refers to a private person, a place, or a public reference. (venturebeat.com) OpenAI is not the first company to ship open-source privacy tooling. Microsoft’s Presidio, one of the better-known existing projects in this category, also offers detection and redaction pipelines for sensitive data across text and other formats. (github.com) The timing fits a broader push to move safeguards into the software path itself instead of leaving them to policy documents or user behavior. OpenAI already offers separate guardrail tools for prompt injection, unsafe content, and other checks around model use. (guardrails.openai.com) For developers building consumer agents or enterprise assistants, the practical change is simple: redact first, send later. OpenAI’s release turns that step into a downloadable component instead of a promise buried in settings or compliance paperwork. (decrypt.co)