Privacy headaches for AI systems
Privacy teams are flagging major gaps in how LLMs and retrieval‑augmented systems handle personal data — core problems include PII redaction, data minimization, and enforcing 'right to be forgotten' when agents reuse documents. (x.com) Regulators are also active: a UK outsourcing breach at Capita involved about 6.6 million pension records and drew roughly a £14 million fine, and a French health‑pass exposure impacted about 500,000 entries. (x.com) (x.com)
Large language models work by predicting the next word, and retrieval systems work by pulling old documents back into the prompt. Privacy lawyers say that combination can turn deleted, hidden, or over-collected personal data into fresh output. (ico.org.uk) The United Kingdom Information Commissioner’s Office said in its generative artificial intelligence consultation response that developers and deployers still need to apply the United Kingdom General Data Protection Regulation and the Data Protection Act 2018 to these systems. Its broader artificial intelligence guidance says data protection duties still sit at the core of design, testing, and deployment. (ico.org.uk 1) (ico.org.uk 2) In practice, that means a chatbot cannot treat every document it can reach as fair game. The Information Commissioner’s Office says organizations should map where personal information is used, justify why they keep it, remove irrelevant data, and consider anonymised or synthetic data instead. (ico.org.uk) European regulators are pressing on a second problem: whether a model or agent can still leak personal data after the source data is supposed to be gone. In a December 2024 opinion, the European Data Protection Board said an artificial intelligence model trained on personal data cannot automatically be treated as anonymous, and regulators should test whether personal data can be extracted directly or through prompts. (edpb.europa.eu) That question runs straight into the “right to be forgotten,” the rule that lets people ask for erasure of personal data in many cases. On 18 February 2026, the European Data Protection Board said 32 data protection authorities had joined a coordinated review of erasure compliance, with 764 controllers responding and regulators finding seven recurring challenges. (edpb.europa.eu) Security failures are keeping the issue concrete. On 15 October 2025, the United Kingdom Information Commissioner’s Office fined Capita plc £8 million and Capita Pension Solutions Limited £6 million after a March 2023 cyberattack that exposed personal information from 6.6 million people, including pension and staff records. (ico.org.uk) French regulators have also been escalating enforcement. The National Commission on Informatics and Liberty, known as the Commission nationale de l'informatique et des libertés, said on 5 February 2025 that it issued 87 sanctions in 2024, up from 42 in 2023, with cumulative fines of €55.2 million. (cnil.fr) Health data gets extra scrutiny because it can identify a person and reveal medical history at the same time. The French regulator said on 12 September 2024 that it fined Cegedim Santé €800,000 for processing health data without authorization. (cnil.fr) The technical dispute underneath all of this is simple: if an agent copies a document into a prompt, indexes it for retrieval, or stores it in logs, deleting the original file may not delete every downstream use. Regulators are now asking companies to prove where the data went, who can query it, and whether it can still be pulled back out. (ico.org.uk) (edpb.europa.eu) The result is that privacy compliance for artificial intelligence is shifting from paperwork to system design. The companies that can show deletion paths, redaction controls, and narrower data collection will have an easier time with the next regulator letter than the ones still treating prompts, indexes, and model outputs as separate worlds. (ico.org.uk) (edpb.europa.eu)