Traditional models still ‘outperform AI’ for extreme weather forecasts

Wednesday, 29 April 2026

Computer models that use artificial intelligence (AI) cannot forecast record-breaking weather as well as traditional climate models, according to a new study.

It is well established that AI climate models have surpassed traditional, physics-based climate models for some aspects of weather forecasting.

However, new research published in Science Advances finds that AI models still “underperform” in forecasting record-breaking extreme weather events.

The authors tested how well both AI and traditional weather models could simulate thousands of record-breaking hot, cold and windy events that were recorded in 2018 and 2020.

They find that AI models underestimate both the frequency and intensity of record-breaking events.

A study author tells Carbon Brief that the analysis is a “warning shot” against replacing traditional models with AI models for weather forecasting “too quickly”.

§ AI weather forecasts

Extreme weather events, such as floods, heatwaves and storms, drive hundreds of billions of dollars in damages every year through the destruction of cropland, impacts on infrastructure and the loss of human life.

Many governments have developed early warning systems to prepare the general public and mobilise disaster response teams for imminent extreme weather events. These systems have been shown to minimise damages and save lives.

For decades, scientists have used numerical weather prediction models to simulate the weather days, or weeks, in advance.

These models rely on a series of complex equations that reproduce processes in the atmosphere and ocean. The equations are rooted in fundamental laws of physics, based on decades of research by climate scientists. As a result, these models are referred to as “physics-based” models.

However, AI-based climate models are gaining popularity as an alternative for weather forecasting.

Instead of using physics, these models use a statistical approach. Scientists present AI models with a large batch of historical weather data, known as training data, which teaches the model to recognise patterns and make predictions.

To produce a new forecast, the AI model draws on this bank of knowledge and follows the patterns that it knows.

There are many advantages to AI weather forecasts. For example, they use less computing power than physics-based models, because they do not have to run thousands of mathematical equations.

Furthermore, many AI models have been found to perform better than traditional physics-based models at weather forecasts.

However, these models also have drawbacks.

Study author Prof Sebastian Engelke, a professor at the research institute for statistics and information science at the University of Geneva, tells Carbon Brief that AI models “depend strongly on the training data” and are “relatively constrained to the range of this dataset”.

In other words, AI models struggle to simulate brand new weather patterns, instead tending forecast events of a similar strength to those seen before. As a result, it is unclear whether AI models can simulate unprecedented, record-breaking extreme events that, by definition, have never been seen before.

§ Record-breaking extremes

Extreme weather events are becoming more intense and frequent as the climate warms. Record-shattering extremes – those that break existing records by large margins – are also becoming more regular.

For example, during a 2021 heatwave in north-western US and Canada, local temperature records were broken by up to 5C. According to one study, the heatwave would have been “impossible” without human-caused climate change.

The new study explores how accurately AI and physics-based models can forecast such record-breaking extremes.

First, the authors identified every heat, cold and wind event in 2018 and 2020 that broke a record previously set between 1979 and 2017. (They chose these years due to data availability.) The authors use ERA5 reanalysis data to identify these records.

This produced a large sample size of record-breaking events. For the year 2020, the authors identified around 160,000 heat, 33,000 cold and 53,000 wind records, spread across different seasons and world regions.

For their traditional, physics-based model, the authors selected the High RESolution forecast model from the Integrated Forecasting System of the European Centre for Medium-Range Weather Forecasts. This is “widely considered as the leading physics-based numerical weather prediction model”, according to the paper.

They also selected three “leading” AI weather models – the GraphCast model from Google Deepmind, Pangu-Weather developed by Huawei Cloud and the Fuxi model, developed by a team from Shanghai.

The authors then assessed how accurately each model could forecast the extremes observed in the year 2020.

Dr Zhongwei Zhang is the lead author on the study and a researcher at Karlsruhe Institute of Technology. He tells Carbon Brief that many AI weather forecast models were built for “general weather conditions”, as they use all historical weather data to train the models. Meanwhile, forecasting extremes is considered a “secondary task” by the models.

The authors explored a range of different “lead times” – in other words, how far into the future the model is forecasting. For example, a lead time of two days could mean the model uses the weather conditions at midnight on 1 January to simulate weather conditions at midnight on 3 January.

The plot below shows how accurately the models forecasted all extreme events (left) and heat extremes (right) under different lead times. This is measured using “root mean square error” – a metric of how accurate a model is, where a lower value indicates lower error and higher accuracy.

The chart on the left shows how two of the AI models (blue and green) performed better than the physics-based model (black) when forecasting all weather across the year 2020.

However, the chart on the right illustrates how the physics-based model (black) performed better than all three AI models (blue, red and green) when it came to forecasting heat extremes.

Image - Accuracy of the AI models (blue, red and green) and the physics-based model (black) at forecasting all weather over 2020 (left) and heat extremes (right) over a range of lead times. This is measured using “root mean square error” (RMSE) – a metric of how accurate a model is, where a lower value indicates lower error and higher accuracy. Source: Zhang et al (2026). - Accuracy of the AI models (note)

The authors note that the performance gap between AI and physics-based models is widest for lower lead times, indicating that AI models have greater difficulty making predictions in the near future.

They find similar results for cold and wind records.

In addition, the authors find that AI models generally “underpredict” temperature during heat records and “overpredict” during cold records.

The study finds that the larger the margin that the record is broken by, the less well the AI model predicts the intensity of the event.

§ ‘Warning shot’

Study author Prof Erich Fischer is a climate scientist at ETH Zurich and a Carbon Brief contributing editor. He tells Carbon Brief that the result is “not unexpected”.

He adds that the analysis is a “warning shot” against replacing traditional models with AI models for weather forecasting “too quickly”.

AI models are likely to continue to improve, but scientists should “not yet” fully replace traditional forecasting models with AI ones, according to Fischer.

He explains that accurate forecasts are “most needed” in the runup to potential record-breaking extremes, because they are the trigger for early warning systems that help minimise damages caused by extreme weather.

Leonardo Olivetti is a PhD student at Uppsala University, who has published work on AI weather forecasting and was not involved in the study.

He tells Carbon Brief that “many other studies” have identified issues with using AI models for “extremes”, but this paper is novel for its specific focus on extremes.

Olivetti notes that AI models are already used alongside physics-based models at “some of the major weather forecasting centres around the world”. However, the study results suggest “caution against relying too heavily on these [AI] models”, he says.

Prof Martin Schultz, a professor in computational earth system science at the University of Cologne who was not involved in the study, tells Carbon Brief that the results of the analysis are “very interesting, but not too surprising”.

He adds that the study “justifies the continued use of classical numerical weather models in operational forecasts, in spite of their tremendous computational costs”.

§ Advances in forecasting

The field of AI weather forecasting is evolving rapidly.

Olivetti notes that the three AI models tested in the study are an “older generation” of AI models. In the last two years, newer “probabilistic” forecast models have emerged that “claim to better capture extremes”, he explains.

The three AI models used in the analysis are “deterministic”, meaning that they only simulate one possible future outcome.

In contrast, study author Engelke tells Carbon Brief that probabilistic models “create several possible future states of the weather” and are therefore more likely to capture record-breaking extremes.

Engelke says it is “important” to evaluate the newer generation of models for their ability to forecast weather extremes.

He adds that this paper has set out a “protocol” for testing the ability of AI models to predict unprecedented extreme events, which he hopes other researchers will go on to use.

The study says that another “promising direction” for future research is to develop models that combine aspects of traditional, physics-based weather forecasts with AI models.

Engelke says this approach would be “best of both worlds”, as it would combine the ability of physics-based models to simulate record-breaking weather with the computational efficiency of AI models.

Dr Kyle Hilburn, a research scientist at Colorado State University, notes that the study does not address extreme rainfall, which he says “presents challenges for both modelling and observing”. This, he says, is an “important” area for future research.

🗂️ back to the index