OpenAI Privacy Filter
- OpenAI released an open-source, on-device Privacy Filter to remove personal information from enterprise datasets. - The model is published under an Apache 2.0 licence for on-device data sanitization. - The tool treats redaction as a product feature to help deploy AI safely inside regulated workflows (venturebeat.com).
Before this release, privacy filtering meant scanning text for personal details like names, emails, or account numbers and masking them before the data moved elsewhere. OpenAI has now published a small open-source model that does that job on-device, so raw text can stay on a laptop or in a browser instead of being sent to a server. (openai.com) OpenAI announced Privacy Filter on April 22, 2026, and released it under the Apache 2.0 license on GitHub and Hugging Face. The company said the model is designed for de-identification, the process of removing personally identifiable information from text before it is stored, searched, or used in artificial intelligence systems. (openai.com) (github.com) (huggingface.co) The model is a bidirectional token-classification system, which means it reads text and labels the spans that should be redacted. OpenAI said the release has 1.5 billion parameters in total, 50 million active parameters, and a 128,000-token context window, so it can process long documents without splitting them into small chunks. (openai.com) (github.com) OpenAI said the model is small enough to run locally in a web browser or on a laptop. The company also said users can tune precision and recall, the tradeoff between catching more sensitive text and avoiding false alarms, through preset operating points and fine-tuning. (github.com) (huggingface.co) That setup targets a common enterprise problem: companies want to use internal documents for search, analytics, or model training, but regulated records can contain patient data, financial identifiers, employee details, and customer information. OpenAI said keeping the filtering step local reduces exposure because unfiltered text does not need to leave the device for redaction. (openai.com) (venturebeat.com) OpenAI’s model card says the system was trained and evaluated on synthetic privacy datasets built from public data, with added format-matching augmentation to widen the range of names, identifiers, and surface forms it could detect. The same card says the model is intended for privacy filtering in text and documents, not as a general-purpose language model. (cdn.openai.com) The company framed the release as part of a broader push to make privacy controls a built-in product feature rather than a separate compliance step. In its enterprise privacy materials, OpenAI says business customers retain control over their inputs and outputs across ChatGPT Business, ChatGPT Enterprise, ChatGPT for Healthcare, ChatGPT Edu, ChatGPT for Teachers, and the API platform. (venturebeat.com) (openai.com) The open-source license also makes this different from a hosted-only safety tool. Apache 2.0 allows commercial use, modification, and redistribution, which means companies can adapt the filter to their own document formats, run it inside private environments, and ship it inside internal workflows without negotiating a separate model license. (github.com) (huggingface.co) OpenAI said the model delivers “frontier-level privacy filtering performance” while staying small enough for local deployment, but the real test will be whether companies trust it on messy records from hospitals, banks, insurers, and large customer-service systems. The release gives those teams a new option: redact first, and move the data later. (openai.com) (venturebeat.com)