OpenAI releases Privacy Filter
What happened
- OpenAI published Privacy Filter, an open-source, on-device model that strips personal information from datasets. - It's been released under an Apache 2.0 licence as a local data-sanitisation tool for enterprises. - The release signals privacy-preserving preprocessing is becoming a standard layer in enterprise AI stacks. (venturebeat.com)
Why it matters
OpenAI has released Privacy Filter, a small open-source model that finds and masks personal data in text on a laptop or in a browser. (openai.com) OpenAI published the model on April 22, 2026, with code on GitHub and weights under an Apache 2.0 license. The company said the model has 1.5 billion total parameters, with 50 million active at a time, and a 128,000-token context window for long documents. (openai.com, github.com) The basic job is de-identification: software scans text for names, addresses, account numbers, passwords, and other personal details, then replaces or removes them before the data goes into training, search, or analytics systems. OpenAI describes Privacy Filter as a bidirectional token-classification model with span decoding, which means it labels pieces of text and groups them into redactable spans. (openai.com, cdn.openai.com) Running that step on-device keeps raw text off external servers until after sensitive fields are stripped. OpenAI said the model is intended for local preprocessing in enterprise pipelines, where companies often need to sanitize records before they can use them with larger models. (openai.com, venturebeat.com) OpenAI said Privacy Filter reached 96% F1 on the PII-Masking-300k benchmark out of the box, and 97.43% on a corrected version of the same test set. The company also said the model can be fine-tuned for different data distributions and lets users trade off precision and recall through preset operating points. (cdn.openai.com, github.com) The model card says Privacy Filter detects eight categories of personally identifiable information, including direct identifiers and secrets such as credentials. Hugging Face has already added documentation for the model, which points to expected use in open-source machine learning workflows rather than only inside OpenAI products. (github.com, github.com) OpenAI also included limits. The company said the model does not guarantee legal anonymization, performs best in English, and should be reviewed by humans in high-risk settings because missed detections and false positives can still happen. (cdn.openai.com, the-decoder.com) That caveat tracks the pressure on companies to process internal data without exposing regulated information. European and U.S. privacy rules do not name this model, but enterprises deploying artificial intelligence systems have been building more preprocessing layers to separate usable text from personal records before those systems run. (venturebeat.com, openai.com) The release also adds to OpenAI’s recent push to publish more openly deployable tooling under permissive terms. This time, the company is not shipping a chatbot or a general-purpose model first; it is shipping the filter that sits in front of one. (openai.com, openai.com)
Key numbers
- It's been released under an Apache 2.0 licence as a local data-sanitisation tool for enterprises.
- (openai.com) OpenAI published the model on April 22, 2026, with code on GitHub and weights under an Apache 2.0 license.
- The company said the model has 1.5 billion total parameters, with 50 million active at a time, and a 128,000-token context window for long documents.
- (openai.com, venturebeat.com) OpenAI said Privacy Filter reached 96% F1 on the PII-Masking-300k benchmark out of the box, and 97.43% on a corrected version of the same test set.
What happens next
- Hugging Face has already added documentation for the model, which points to expected use in open-source machine learning workflows rather than only inside OpenAI products.
Quick answers
What happened in OpenAI releases Privacy Filter?
OpenAI published Privacy Filter, an open-source, on-device model that strips personal information from datasets. It's been released under an Apache 2.0 licence as a local data-sanitisation tool for enterprises. The release signals privacy-preserving preprocessing is becoming a standard layer in enterprise AI stacks. (venturebeat.com)
Why does OpenAI releases Privacy Filter matter?
OpenAI has released Privacy Filter, a small open-source model that finds and masks personal data in text on a laptop or in a browser. (openai.com) OpenAI published the model on April 22, 2026, with code on GitHub and weights under an Apache 2.0 license. The company said the model has 1.5 billion total parameters, with 50 million active at a time, and a 128,000-token context window for long documents. (openai.com, github.com) The basic job is de-identification: software scans text for names, addresses, account numbers, passwords, and other personal details, then replaces or removes them before the data goes into training, search, or analytics systems. OpenAI describes Privacy Filter as a bidirectional token-classification model with span decoding, which means it labels pieces of text and groups them into redactable spans. (openai.com, cdn.openai.com) Running that step on-device keeps raw text off external servers until after sensitive fields are stripped. OpenAI said the model is intended for local preprocessing in enterprise pipelines, where companies often need to sanitize records before they can use them with larger models. (openai.com, venturebeat.com) OpenAI said Privacy Filter reached 96% F1 on the PII-Masking-300k benchmark out of the box, and 97.43% on a corrected version of the same test set. The company also said the model can be fine-tuned for different data distributions and lets users trade off precision and recall through preset operating points. (cdn.openai.com, github.com) The model card says Privacy Filter detects eight categories of personally identifiable information, including direct identifiers and secrets such as credentials. Hugging Face has already added documentation for the model, which points to expected use in open-source machine learning workflows rather than only inside OpenAI products. (github.com, github.com) OpenAI also included limits. The company said the model does not guarantee legal anonymization, performs best in English, and should be reviewed by humans in high-risk settings because missed detections and false positives can still happen. (cdn.openai.com, the-decoder.com) That caveat tracks the pressure on companies to process internal data without exposing regulated information. European and U.S. privacy rules do not name this model, but enterprises deploying artificial intelligence systems have been building more preprocessing layers to separate usable text from personal records before those systems run. (venturebeat.com, openai.com) The release also adds to OpenAI’s recent push to publish more openly deployable tooling under permissive terms. This time, the company is not shipping a chatbot or a general-purpose model first; it is shipping the filter that sits in front of one. (openai.com, openai.com)