AI weather models underpredict record extremes

- A Science Advances study led by KIT and the University of Geneva says ECMWF’s physics-based HRES still beats top AI models on record-breaking extremes. - The gap shows up in heat, cold, and wind records: GraphCast, Pangu-Weather, and Fuxi more often understated both frequency and intensity. - That matters because AI weather systems are fast and cheap, but early-warning and disaster planning still need models that extrapolate safely.

Weather forecasting is one of AI’s biggest recent wins. Models like GraphCast and Pangu-Weather can generate global forecasts in minutes, often with accuracy that rivals much slower supercomputer runs. But the awkward question was always the obvious one — what happens when the atmosphere does something rare, ugly, and outside the historical pattern book? A new paper in *Science Advances*, published in early May 2026, says that’s exactly where today’s leading AI weather models still fall short. ### What actually changed? The new result is not just another vague warning about “AI risk.” Zhongwei Zhang and colleagues compared ECMWF’s physics-based High RESolution forecast model, or HRES, against major AI systems including GraphCast, GraphCast operational, Pangu-Weather, Pangu-Weather operational, and Fuxi. For record-breaking heat, cold, and wind events, HRES consistently came out ahead across nearly all lead times. (science.org) ### What does “record-breaking extremes” mean here? It means events that push beyond what the models effectively learned from past weather. The paper’s point is not that AI is bad at ordinary forecasting. It’s that unprecedented events are a different test. Those are the cases that strain emergency planning most — the heat wave that beats the old local record by a lot, or the wind event that lands outside the usual envelope. (science.org) ### Where do the AI models miss? The pattern is pretty specific. The researchers say the AI systems tended to underestimate both the frequency and the intensity of record-breaking events. They also found a directional bias — hot records were underpredicted, while cold records were overestimated, with errors growing as the event pushed farther beyond prior records. Basically, the more “unseen” the weather became, the shakier the AI extrapolation got. (science.org) ### Why would that happen? Because these models are trained on history. That works brilliantly when tomorrow looks enough like some remix of yesterday. But rare extremes are, by definition, thinly represented in the training data. Physics-based models have their own flaws, but they are built around conservation laws and atmospheric dynamics, so they have a sturdier way to behave when the system enters territory that humans have barely observed. Think of it as pattern memory versus rule-based reasoning. (science.org) ### Haven’t researchers seen this before? Yes — and that is what gives the new paper weight. A 2025 study in *Artificial Intelligence for the Earth Systems* found that machine-learning weather models did not consistently beat HRES on high-impact case studies like the 2021 Pacific Northwest heat wave and a South Asian humid-heat event, and they lacked some impact-relevant variables. Another 2025 PNAS study tested “gray swan” events and found neural-network weather models could fail when asked to forecast extremes beyond their training range, including the strongest tropical cyclones. (science.org) ### So is the AI weather boom over? No — not even close. These systems are still fast, cheap, and often extremely good for day-to-day forecasting. One reason people got excited in the first place is that they can approach top-tier forecast skill while using vastly less computation. That makes them useful now, especially as supplements, ensembles, or rapid first-pass tools. The catch is that “usually good” is not the same as “safe to trust alone when stakes are highest.” (journals.ametsoc.org) ### What are people likely to do with this? Probably not abandon AI, but hybridize it harder. The direction of travel is pretty clear: bake in more physics, train on better synthetic extreme-event data, and evaluate models on impact cases instead of average benchmark scores. The message from this week’s paper is narrower than the social-media version, but more useful — AI weather models are real progress, just not a full replacement for physics-based forecasting when records are on the line. (news.ucsc.edu) ### Bottom line AI weather forecasts are impressive, but record extremes are still the stress test that matters most. Right now, physics still has the edge where planners, emergency managers, and forecasters can least afford a miss. (science.org)

AI weather models underpredict record extremes

Get your own daily briefing