OpenAI ships Privacy Filter model
- OpenAI said on April 22 it released Privacy Filter, an open-weight model that detects and redacts personally identifiable information in text before companies index, log, or share that data. - The model is built to run locally in a single pass, with 1.5 billion total parameters, 50 million active parameters, and a 128,000-token context window for long documents. - The release expands OpenAI’s newer open-model push and gives developers an Apache 2.0-licensed privacy tool they can inspect, fine-tune, and deploy on-premises. (openai.com)
Privacy filtering is the step where software spots names, emails, phone numbers, and other personal details before text is stored or sent elsewhere. OpenAI said on April 22 that it released Privacy Filter, an open-weight model for that job. (openai.com) OpenAI described Privacy Filter as a model for detecting and redacting personally identifiable information, or PII, in unstructured text. The company said it is meant for training, indexing, logging, and review pipelines. (openai.com) Instead of generating text one token at a time like a chatbot, Privacy Filter labels tokens across an input in one forward pass and then decodes spans to decide what should be masked. OpenAI said that design is aimed at high-throughput sanitization workflows. (huggingface.co) (github.com) The company released the model under the Apache 2.0 license and published weights on Hugging Face alongside code and examples on GitHub. OpenAI said teams can run it in their own environments and fine-tune it for custom data distributions. (huggingface.co) (github.com) OpenAI said the model has 1.5 billion total parameters, with 50 million active parameters, and can run in a web browser or on a laptop. Its context window is 128,000 tokens, which lets it process long transcripts or logs without chunking them first. (huggingface.co) OpenAI said the model can run locally so unfiltered text does not have to leave the device before redaction. The company said it already uses a fine-tuned version of Privacy Filter in its own privacy-preserving workflows. (openai.com) OpenAI also said Privacy Filter reaches state-of-the-art performance on the PII-Masking-300k benchmark after correcting annotation issues it identified during evaluation. The released model card describes it as a bidirectional token-classification model with span decoding. (openai.com) (cdn.openai.com) The release fits a broader shift in OpenAI’s product line toward open-weight tools that developers can inspect and run outside OpenAI’s hosted products. OpenAI’s open-models page now lists gpt-oss and gpt-oss-safeguard alongside Privacy Filter-related resources. (openai.com 1) (openai.com 2) For companies handling meeting transcripts, support logs, and software traces, the pitch is simple: strip personal data before the text is indexed or forwarded to larger systems. OpenAI’s release turns that workflow into a local model download instead of a cloud-only service. (openai.com) (github.com)