Google Thwarts Gemini Model Theft

Google blocked a large-scale model extraction attack against its flagship Gemini AI. The attempt reportedly involved hackers using over 100,000 carefully crafted prompts in an effort to clone the proprietary model. Google has warned of potential legal action against the perpetrators.

- Model extraction attacks aim to reverse-engineer a proprietary AI by systematically querying it to build a surrogate model that behaves similarly. This form of intellectual property theft allows competitors to replicate a model's capabilities at a fraction of the cost. The attackers in the Gemini incident used a technique called "reasoning trace coercion," attempting to force the model to reveal its internal reasoning processes. - The attempt on Gemini was part of a broader pattern of misuse identified by Google's Threat Intelligence Group (GTIG), which has also observed state-sponsored actors from China, Iran, North Korea, and Russia using the AI for malicious purposes. These activities include generating phishing campaigns, debugging malicious code, and researching targets. - One specific North Korean-linked group, identified as UNC2970, used Gemini to gather open-source intelligence on defense contractors and cybersecurity firms to aid in phishing campaigns. Another group, an Iranian actor known as APT42, leveraged the AI to craft targeted social engineering campaigns by generating culturally relevant and nuanced communications. - Attackers do not need access to the model's underlying architecture or training data to perform an extraction attack; they only need the ability to repeatedly query the model's API and analyze the outputs. In a proof-of-concept, researchers achieved 80.1% accuracy in a replica model after just 1,000 queries. - Besides direct theft, stolen models can be used to discover new vulnerabilities, create adversarial examples to fool the original model, or reconstruct sensitive information from the training data. Google noted that this specific attack did not threaten user data but was a direct risk to service providers and model builders. - Google is not the only company facing such threats. OpenAI recently accused Chinese AI startup DeepSeek of using "distillation" techniques to illicitly replicate its advanced models, highlighting a growing trend of competitive espionage in the AI sector. - Defenses against model extraction include limiting API access, implementing rate limiting on queries, monitoring for unusual usage patterns, and adding "noise" or watermarks to the model's predictions to make replication more difficult. - The legal landscape for AI model theft is still developing, with many lawsuits focused on copyright infringement related to training data. However, a case involving medical AI startup OpenEvidence and competitor Pathway Medical is testing whether "prompt injection" techniques used to extract proprietary information can be prosecuted as trade secret theft.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.