Anthropic Accuses Chinese Firms of Data Scraping
Anthropic has reportedly accused several Chinese AI firms of creating 24,000 fake accounts to generate 16 million prompts. The alleged activity was intended to scrape data from Anthropic's models to train their own, raising concerns about intellectual property risks and competitive ethics in the AI industry.
- The three Chinese firms accused by Anthropic are DeepSeek, Moonshot AI, and MiniMax. MiniMax is alleged to have run the largest operation, generating over 13 million exchanges with Anthropic's model, Claude. - The technique used is called "distillation," where a less capable AI model is trained using the outputs of a more advanced one to quickly improve its performance at a fraction of the cost. While distillation can be a legitimate internal development practice, Anthropic claims this use violates its terms of service. - To bypass Anthropic's ban on commercial access from China, the firms allegedly used commercial proxy services that manage large networks of fraudulent accounts, with one network controlling over 20,000 accounts at once. - The alleged scraping campaigns specifically targeted Claude's most advanced capabilities, including complex reasoning, coding, and the ability to use other software tools. For example, DeepSeek focused on tasks that would reveal the model's step-by-step internal reasoning. - This is not an isolated incident; Anthropic's competitor OpenAI made similar accusations to U.S. lawmakers earlier this month, also pointing to DeepSeek for using distillation techniques. - Anthropic argues this issue poses a national security risk, as models created through illicit distillation might lack the original safety features designed to prevent misuse, such as developing bioweapons or enabling cyberattacks. - The accusations have drawn criticism from some industry observers, including Elon Musk, who argue that major AI labs like Anthropic also train their models on vast datasets scraped from the internet, and have faced their own lawsuits over data practices. - In response, Anthropic is developing systems to detect distillation attack patterns, sharing intelligence with other AI labs and cloud providers, and tightening its account verification processes.