Canada finds OpenAI violated privacy
- Canada’s federal privacy commissioner and regulators in Quebec, B.C., and Alberta said on May 6 that OpenAI’s early ChatGPT training broke Canadian privacy law. - The probe focused on ChatGPT 3.5 and 4, and flagged overcollection, weak consent, shaky transparency, hallucinated personal data, and poor deletion controls. - OpenAI has already changed some practices, but Canada’s finding raises the bar for how AI companies source and govern training data.
Privacy law is the domain here — but the real stake is how AI companies get to build models in the first place. Canada’s federal privacy commissioner and three provincial regulators just said OpenAI crossed the line when it trained early versions of ChatGPT on personal information. The gap was simple: scrape first, sort out consent and deletion later. On May 6, that approach got a formal legal rebuke in Canada. ### What did Canada actually decide? The joint investigation came from the federal privacy commissioner plus Quebec, British Columbia, and Alberta. It looked at whether OpenAI’s collection, use, and disclosure of Canadians’ personal information through ChatGPT complied with private-sector privacy laws. The answer, for the early models they examined, was basically no — regulators said the way ChatGPT was initially trained was not compliant. ### Which versions were under the microscope? This was not a ruling about every OpenAI system as it exists today. The investigation looked at models in use when the case began in 2023 — specifically ChatGPT 3.5 and ChatGPT 4. The complaint that triggered it was filed in April 2023, and the joint probe was announced the next month, so regulators were judging the company’s earlier development and rollout choices. ### What did regulators say OpenAI got wrong? The findings are broader than “you scraped the web.” Regulators said OpenAI overcollected personal information, failed to get valid consent, and did not give people enough transparency about how their data could be pulled from public sources like forums and social platforms. They also said ChatGPT could generate inaccurate or fabricated personal information about people and not delete that information. ### Why is consent the hard part? Because “publicly accessible” does not automatically mean “fair game for any AI training purpose.” That is the core clash here. Canada’s regulators are saying that if a system ingests huge amounts of personal data — including potentially sensitive details like health information, political views, or children’s data — the company still needs consent for it before it is absorbed into a model. ### Did Canada order ChatGPT to shut down? No — and that is an important nuance. The federal commissioner said the complaint was “well-founded and conditionally resolved” because OpenAI has already implemented some changes and committed to more in the coming months. Regulators said the company has significantly limited the personal and sensitive information used to train new ChatGPT models, and they plan to monitor whether those commitments hold. ### So why does this still matter? Because this is a direct shot at the old AI industry default: gather massive datasets first, then clean up edge cases later. Canada is saying privacy compliance has to be built into model development itself — data sourcing, consent, transparency, accuracy, deletion, accountability, the whole chain. That does not just affect OpenAI. It gives every model builder a clearer warning that “the open web” is not a blanket exemption. ### What changes for AI companies now? The practical burden is heavier. Companies will need tighter records on where training data came from, what personal information was included, what legal basis covers it, and how a person can get bad or sensitive information removed. Basically, provenance stops being a nice-to-have research problem and becomes a compliance problem. That is especially true because the laws were written long before generative AI. ### Bottom line This was not Canada banning ChatGPT. It was Canada telling AI companies that scale is not a defense — if you train on people’s data, privacy law still follows the data.