DP‑Fusion enables token‑level differential privacy

- Rushil Thareja and co-authors presented DP-Fusion at ICLR 2026, describing a way to give large language model outputs token-level differential privacy during inference. - The method runs the model twice, with and without sensitive tokens, then blends token probabilities; the paper reports 6× lower perplexity than prior methods. - It targets privacy leaks in retrieval and document rewriting, not training-time memorization. (openreview.net)

Differential privacy is a math rule for limiting what an output can reveal about any one secret input. DP-Fusion applies that rule one token at a time while a language model is generating text. (openreview.net) The paper, “DP-Fusion: Token-Level Differentially Private Inference for Large Language Models,” was accepted as an ICLR 2026 poster and presented April 25, 2026. The authors are Rushil Thareja, Preslav Nakov, Praneeth Vepakomma, and Nils Lukas. (openreview.net) (iclr.cc) The problem is inference-time leakage: a model can reveal details from the document or database it was given at prompt time, even if the model itself was never trained on that data. The paper focuses on documents containing personally identifiable information that need to be paraphrased without exposing the hidden terms. (arxiv.org) DP-Fusion’s recipe is simple on paper. First mark sensitive tokens, then run the model once without them to get a baseline, run it again with them present, and fuse the two next-token distributions before sampling. (arxiv.org) (openreview.net) That fusion step is the core claim. The authors say it bounds how much the sensitive tokens can change the probability of each generated token, giving a formal differential privacy guarantee at generation time. (openreview.net) (github.com) The privacy knob is epsilon, a standard differential privacy parameter. In the paper’s description, epsilon equals 0 hides sensitive tokens entirely, while larger values allow more influence from the private context in exchange for better text quality. (arxiv.org) The authors report a utility gain over earlier differentially private inference methods. Their headline number is 6× lower perplexity, a common measure of how natural or predictable generated text is. (openreview.net) (iclr.cc) They also frame the method as relevant beyond document sanitization. The paper says the same per-token influence bound can mitigate jailbreak-style prompt injection, because malicious prompt tokens are treated as tokens whose effect on output must be limited. (arxiv.org) The code is already public in two forms: an official research repository and a Python package called `dp-fusion-lib`. The package README says it supports PII detection, rewriting, and formal \((\epsilon,\delta)\)-differential privacy for text generation workflows. (github.com 1) (github.com 2) That makes DP-Fusion a different kind of privacy claim from private fine-tuning. It is aimed at the moment a model reads sensitive context and writes an answer, which is where retrieval-augmented and document-processing systems often handle private data. (openreview.net) (arxiv.org)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.