AI Giants Train on User Chats By Default

A Stanford analysis flags that major AI companies like OpenAI and Google train their models on user conversations by default. This chat data is often merged with broader datasets to build user profiles, raising significant trust issues for personalized apps and fueling calls for opt-in policies and federal regulation.

A key finding from the Stanford analysis is that the data collected isn't limited to just chat inputs; it can also include uploaded files and information from other products within the company's ecosystem. This practice of merging data from various sources helps build more comprehensive user profiles. Some companies may retain these chat logs indefinitely, even after a user has deleted their account. Both OpenAI and Google use human reviewers to analyze a subset of user conversations to improve their AI models. For Google's Gemini, conversations that have been reviewed by humans can be stored for up to three years, even if the user later deletes their activity. OpenAI also retains the right to review conversations for safety purposes for up to 30 days, even if a user has opted out of data use for model training. Opting out of this data collection is possible but not always straightforward. For ChatGPT, users need to navigate to "Settings" and then "Data Controls" to turn off the "Improve the model for everyone" toggle. On Google Gemini, the setting is found under "Gemini Apps Activity," where users can choose to turn off activity tracking. However, even with training turned off, Google may still store conversations for up to 72 hours for service operation. This default data collection occurs in a legal gray area, as there is no comprehensive federal privacy law in the United States mandating an "opt-in" approach for AI training. While the American Data Privacy and Protection Act (ADPPA) was proposed, it did not become law. This leaves a patchwork of state-level regulations, such as the Colorado Artificial Intelligence Act, which is set to become effective in June 2026 and requires companies to disclose information about the data used to train their AI systems. The practice of using chat data by default is having a tangible impact on public trust. According to the 2025 Stanford AI Index Report, global confidence that AI companies will protect personal data has seen a decline. This erosion of trust coincides with a significant increase in AI-related privacy and security incidents, which jumped 56.4% in a single year. The user experience for opting out differs between platforms. OpenAI's ChatGPT offers more granular controls, allowing users to disable training while retaining their chat history. In contrast, turning off "Gemini Apps Activity" in Google's ecosystem can also disable integrations with services like Gmail and Google Docs, forcing users to choose between full functionality and data privacy. The scale of data involved is immense. As of early 2026, OpenAI's ChatGPT has 900 million weekly active users, with users sending over 2.5 billion prompts daily. While the vast majority of these users are on the free tier where data is used for training by default, enterprise-level plans for both OpenAI and Google typically exclude customer data from model training by default. This widespread data collection has led to legal challenges. Major AI companies, including OpenAI and Google, have faced class-action lawsuits alleging the misuse of personal information and copyrighted material for training their AI systems. These lawsuits highlight the ongoing tension between the need for vast datasets to advance AI capabilities and individual privacy rights.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.