Grok Fails at Sports Betting

Research reported by PCMag found xAI’s Grok to be the worst mainstream chatbot at sports betting, calling out accuracy and performance gaps on betting‑related queries compared with peers. The same coverage notes Grok may still be pushed toward corporate adoption despite consumer weaknesses (pcmag.com).

Grok ranked last in a new sports-betting test of eight major chatbots, losing its full simulated bankroll over a Premier League season. (pcmag.com) PCMag reported on April 12 that the benchmark came from General Reasoning and was first shared with the Financial Times. The test replayed the 2023-24 English Premier League season and asked each model to build a betting strategy from team data, prior matches, and odds. (pcmag.com) General Reasoning said every model lost money on average, but Grok 4.20 finished worst, with a mean final bankroll of £11,814 from a £100,000 starting pot across three seeds. Claude Opus 4.6 posted the best result at £89,035, and only Claude Opus 4.6 and GPT-5.4 avoided ruin in all three runs. (gr.inc) The benchmark was built to test something ordinary chatbot quizzes miss: making decisions under uncertainty over months, not answering one prompt at a time. General Reasoning said the models often failed to stick to their own plans, adapt to new conditions, or manage risk coherently across a full season. (gr.inc) That result cuts against Grok’s marketing as a model for real-time analysis. xAI has been pitching Grok Business and Grok Enterprise since December 30, 2025, with features such as custom single sign-on, directory sync, audit controls, and a separate Enterprise Vault for corporate customers. (x.ai) The betting test also does not show that rival chatbots are good gambling tools. General Reasoning’s own results said all frontier models it tested lost money over the season, and many went bust before the final matches. (gr.inc) Sports betting is a hard benchmark because the answer is never fully known in advance and luck can swamp a sound-looking prediction. That makes it closer to forecasting or trading than to a school-style exam with one right answer. (michaeltimbs.me) Grok has also faced other scrutiny this year outside betting performance. In January, Common Sense Media said Grok was “among the worst” chatbots it reviewed for child-safety protections, according to TechCrunch’s report on the findings. (techcrunch.com) The narrower takeaway from the new benchmark is that a chatbot can look capable in short demos and still fall apart when it has to make repeated decisions with money on the line. Grok’s last-place finish put that gap in numbers. (pcmag.com)

Grok Fails at Sports Betting

Get your own daily briefing