Rushil presents DP‑Fusion demo

- Rushil Thareja presented DP-Fusion at ICLR 2026, showing a way to make large language model outputs less revealing by limiting how much any sensitive input token can shape each generated token. - The method runs a model on both redacted and private text, then blends the next-token probability distributions under a differential privacy bound; the paper reports 6x lower perplexity than related methods. - DP-Fusion arrived as on-device and retrieval privacy became a bigger concern for AI systems that handle private documents and database results. (openreview.net)

Large language models can leak secrets at inference time, and Rushil Thareja used an ICLR 2026 poster to show a defense called DP-Fusion. (iclr.cc) (openreview.net) The basic problem is simple: a model can read private text in a prompt, a file, or a database lookup, then reveal clues about it in its answer. DP-Fusion is built for that stage, after training, when the model is already running. (mbzuai.ac.ae) (arxiv.org) Differential privacy is a math rule for making outputs look similar even when a secret input changes. In DP-Fusion, the “secret” can be a token or group of tokens such as a name, address, or other personally identifiable information. (arxiv.org) (mbzuai.ac.ae) The system first labels sensitive tokens, then runs the model without them to get a baseline distribution for the next word. It runs the model again with the sensitive tokens included, then fuses the two distributions before sampling the output token. (openreview.net) (github.com) That fusion step is the core claim: the final token distribution stays within a bounded statistical distance of the redacted baseline. The paper says that bound gives a provable limit on how much the output can reveal about the protected tokens. (arxiv.org) (github.com) The privacy dial is epsilon, written as ε in the paper. At ε = 0, the method hides sensitive tokens entirely; larger ε values allow more influence from the private context and usually better text quality. (openreview.net) (arxiv.org) The authors tested DP-Fusion on document privatization, where a model paraphrases a document containing private details without letting an attacker reliably infer them from the rewrite. They report substantially stronger privacy-utility tradeoffs than related differentially private inference methods, including 6x lower perplexity. (openreview.net) (mbzuai.ac.ae) The paper also says the same per-token influence bound can blunt jailbreak-style prompt injection, though document privatization is the main use case in the experiments. That matters for systems that mix user prompts with retrieved records, internal notes, or medical files. (arxiv.org) (mbzuai.ac.ae) DP-Fusion was accepted as an ICLR 2026 poster, and Thareja’s code and Python package are already public. The pitch is not that models stop using private context, but that each generated token gets a measurable cap on how much any protected token can sway it. (openreview.net) (github.com)

Rushil presents DP‑Fusion demo

Get your own daily briefing