Study Finds Nearly Half of AI APIs in Research are Fake
A startling study found that nearly half of the 17 AI API proxies used in 187 different research papers were either fake or completely unverifiable. The findings highlight significant reliability and reproducibility issues within the ML research tooling ecosystem.
The study, titled "Real Money, Fake Models: Deceptive Model Claims in Shadow APIs," audited 17 third-party services providing access to large language models. These "shadow APIs" are often used by researchers to bypass the high costs, payment barriers, or regional restrictions of official APIs for models like GPT-4 and Gemini. Researchers identified 187 academic papers that utilized these shadow APIs, with one of the most popular services being cited nearly 6,000 times. The use of such services is particularly prevalent in regions with restricted access to the official APIs, which are often priced for enterprise-level customers, posing a challenge for individual academics or students. The investigation uncovered significant performance differences between the shadow and official APIs. On a medical benchmark test, the accuracy of one model plunged from 83.82% using the official API to around 37% through the shadow services. This discrepancy critically undermines the validity and reproducibility of the research findings. The problem extends beyond performance, with the study finding that 45.83% of fingerprinting tests to verify the model's identity failed. This indicates that researchers may not even be using the AI model they believe they are, further compounding the reproducibility crisis in machine learning research. This issue of unverifiable tools is part of a larger conversation about research integrity in the age of AI. The academic community is already grappling with a surge in AI-generated fake research papers and fabricated data, which threatens to erode public trust in scientific findings. Ensuring the reliability of research is a cornerstone of scientific progress. The inability to reproduce results, whether due to fake APIs or other factors, can waste resources, hinder innovation, and lead to flawed conclusions being built upon by other researchers.