Stanford flags thin AI classroom gains
A Stanford study finds only thin evidence that classroom AI tools deliver lasting gains—short‑term performance bumps often evaporate when the tech is removed—urging cautious, evidence‑based adoption in schools. The research suggests piloting tools carefully and measuring outcomes, not just chasing AI hype. (govtech.com)
Stanford’s “The Evidence Base on AI in K-12: A 2026 Review” (released March 11, 2026) examined more than 800 papers in the AI Hub repository and singled out just 20 high‑quality causal studies that can estimate whether AI tools actually changed student or educator outcomes. (scale.stanford.edu) The report notes a striking gap: the review found almost no high‑quality causal studies conducted in U.S. K‑12 classrooms, with many experiments limited to single, short‑duration tasks (often a one‑time 20‑minute intervention). (scale.stanford.edu) One rigorous example, the Tutor CoPilot randomized trial, involved 900 tutors and 1,800 K‑12 students and reported a 4 percentage‑point increase in topic mastery overall and a 9 percentage‑point gain for students paired with lower‑rated tutors, at an estimated cost of $20 per tutor per year. (arxiv.org) A separate study titled “Short‑Term Gains, Long‑Term Gaps” used a sample of 123 students and found ChatGPT and Google produced immediate advantages on low‑order tasks but those advantages fell away on later retention tests and offered no advantage on higher‑order tasks. (scale.stanford.edu) Stanford’s usage analysis of the SchoolAI platform tracked 9,000 U.S. teachers who joined between Aug. 1 and Sept. 15, 2024, and found 16% used the platform once, 43% were short‑term users, 41% became regular users, and 1% were “power users” over a 90‑day window. (scale.stanford.edu) SCALE launched a specific project to study ChatGPT in schools and announced a partnership with OpenAI for K‑12 classroom data sharing to investigate features like “study mode” and impacts on proficiency, retention, and engagement (announced July 29, 2025). (news.stanford.edu) The report highlights actionable patterns: tools with pedagogical guardrails (step‑by‑step guidance) show promise, AI can help scale tutor expertise and improve lower‑rated instructors’ effectiveness, and the literature still leaves equity and student wellness largely unexamined — outcomes that hinge on district funding and tool design. (scale.stanford.edu)