MSML612 - Designing a Transformer-Based Time Series Forecasting Model for Multi-Step, Multi-Variable Weather Prediction
Accurate and timely weather forecasting plays a vital role in multiple sectors, including agriculture, transportation, public safety, and energy management.
Traditionally, weather forecasting has relied on numerical weather prediction (NWP)1. The leading operational medium-range weather forecast globally is the Ensemble Prediction System (ENS), developed by the European Centre for Medium-Range Weather Forecasts (ECMWF)2. This system uses NWP and generates multiple forecast scenarios by introducing small variations to the model's initial conditions.
Machine learning based methods have been developed in recent years. The Graphcast 3 and Gencast 4 models designed by Google Deepmind were the first ones to consistently outperform the numerical models.
Gencast, the latest Google Deepmind prediction model is based on Graph Neural Networks (GNNs) trained with 18 measurements at 13 vertical pressure levels for a global set of locations with 0.25 degrees of precision in latitude and longitude.
The models are trained on the ERA5 reanalysis dataset provided by ECMWF 5 which reconstructs the past weather data point at a global level from 1979 to 2018.
Numerical weather prediction (NWP) involves forecasting the weather by using mathematical equations that represent fluid dynamics. These equations are calculated by computers to generate weather forecasts. Forecasters often analyze and compare multiple NWP outputs, such as regional versus global models or models from different forecasting centers. To improve reliability, ensemble techniques combine results from several model runs, each with slightly varied initial conditions or model settings, using statistical and graphical approaches 7.
NWP forecast is critically dependent on two factors: the initial conditions and the boundary conditions. Initial conditions define the starting point of the simulation which is determined through a process called data assimilation, where observations from satellites, and weather stations are integrated into the model's framework 7.
The Ensemble Prediction System (ENS) was created by the European Centre for Medium-Range Weather Forecasts (ECMWF) and is recognized as one of the two leading global weather forecasting models, also considered the most reliable in the world. The ENS uses multiple runs of the ECMWF model, each with slightly different starting conditions, to generate a range of possible weather outcomes. Specifically, the monthly forecasting system is executed 51 times: one "control" forecast uses the standard initial conditions from ECMWF's ocean and atmospheric analyses, while the other 50 runs use perturbed initial states. This approach allows the ENS to capture and represent a broad spectrum of potential future weather scenarios 8.
Some characteristics of this mode:
- Resolutions up to tens of km
- The forecast depth of the weather model is 6 days
- The forecast step is 3 hours
- The update frequency is 4 times a day
The principal weakness of NWP is its immense computational cost. The complexity of the models requires massive supercomputers with millions of processors to perform the necessary calculations within operational time constraints. Global models, which are necessary for medium-range forecasting, typically operate at resolutions too coarse to capture fine-scale local weather phenomena. This high barrier to entry makes state-of-the-art NWP inaccessible for many smaller, local weather agencies that operate at the state or county level 7.
With the advent of AI technologies, the AI-based weather prediction field has seen a rapid growth. AI models don't follow fundamental physical laws, but they learn from vast amounts of historical weather data. The information is obtained from decades of reanalysis data, such as the ERA5 dataset from ECMWF, which reconstructs the past global weather state by assimilating historical observations 9.
Developed by Google Deepmind, GraphCast is a deep learning model that leverages a Graph Neural Network (GNN) architecture. GraphCast operates autoregressively: given the state of the weather at the current time t and six hours prior t−6h, it predicts the weather state six hours later t+6h. This process is then rolled forward iteratively to produce a 10-day forecast. The model is trained on the ERA5 reanalysis dataset provided by ECMWF 5 and is based on Graph Neural Networks (GNNs) 10.
GraphCast predicted weather conditions more accurately than the High Resolution Forecast (HRES), produced by the European Centre for Medium-Range Weather Forecasts (ECMWF). It was able to make this 10-day forecast in less than a minute on a single Google TPU v4 machine 11.
This model doesn't provide uncertainty estimates when compared to NWP models.
GenCast builds on DeepMind’s previous AI weather models such as GraphCast. The latest one is a deterministic model which only gives the best weather prediction. GenCast, on the other hand, generates 50 or more forecasts, by introducing small variations (Using a diffusion process) to the model's initial conditions and enables the model to express uncertainty.
This model represents the Earth’s surface with evenly spaced nodes, with an icosahedral mesh, eliminating distortions near the poles. It's able to predict the weather conditions for a granularity of 0.25 degrees in latitude and longitude 4.
FourCastNetv2 is a leading-edge machine learning model designed for weather prediction. It utilizes Fourier Neural Operators (FNOs), which excel at efficiently capturing long-range relationships in spatio-temporal data by learning global convolutions.
FNOs rely on the discrete Fourier transform (DFT), however, DFTs cause visual and spectral artifacts as well as pronounced dissipation when learning operators in spherical coordinates since they incorrectly assume a flat geometry. For this reason, the authors of FourCastNetv2 developed a new operator called the Spherical Fourier Transform (SFT) 13.
This technique is focused on deterministic weather forecasting 12.
This model has been developed by Huawei and it's based on a Transformer-based architecture. The team behind this new model designed a three-dimensional (3D) Earth-specific transformer (3DEST) architecture where height is just another dimension. As other models, it's able to predict the weather conditions for a granularity of 0.25 degrees in latitude and longitude in a deterministic way 14.
The current state of weather forecasting is defined by a trade-off between computational cost and fidelity. NWP models are robust but some times computationally prohibitive and often too coarse for precise local forecasting. Global AI models provide incredible speed and accuracy at scale but may lack the resolution required for specific local contexts.
The ERA5 Reanalysis dataset is provided by ECMWF. The parameters included in this dataset and their meaning can be found here: ECMWF - Copernicus Knowledge Base - ERA5: data documentation
We retrieved the code from the ECMWF Climate Data Store using the API provided by this organization to evaluate the dataset. This service provides more than 100 fields with a time granularity of 1 hour and a spatial granularity of 0.25 degrees of latitude and longitude.
WeatherBench2 is a benchmark dataset for weather forecasting models. It's available at: WeatherBench2
The dataset is based on the ERA5 Reanalysis dataset provided by ECMWF and has been pre-processed for easier consumption. Measurements are normalized (E.g.: Temperatures are transformed from Kelvin to Celsius). Redundant variables are removed. The time granularity is limited to 6 hours.
This is the dataset leveraged by Google Deepmind’s Graphcast and Gencast and will be used for our demonstrations.
The WeatherBench2 ERA5 data set has the following structure:
- Dimensions
- Time: Every 6 hours
- Latitude: 0.25 degrees (712 values)
- Longitude: 0.25 degrees (1440 values)
- Level: 13 levels
- 62 Predictors
- 10m_u_component_of_wind, 10m_v_component_of_wind, 10m_wind_speed, 2m_dewpoint_temperature, 2m_temperature, mean_sea_level_pressure, mean_surface_latent_heat_flux, mean_surface_net_long_wave_radiation_flux, mean_surface_net_short_wave_radiation_flux, mean_surface_sensible_heat_flux, mean_top_downward_short_wave_radiation_flux, mean_top_net_long_wave_radiation_flux, mean_top_net_short_wave_radiation_flux, mean_vertically_integrated_moisture_divergence, potential_vorticity, relative_humidity, sea_ice_cover, sea_surface_temperature, slope_of_sub_gridscale_orography, snow_depth, soil_type, specific_humidity, standard_deviation_of_filtered_subgrid_orography, standard_deviation_of_orography, surface_pressure, temperature, total_cloud_cover, total_column_vapor, total_column_water, total_column_water_vapour, total_precipitation_12hr, total_precipitation_24hr, total_precipitation_6hr, type_of_high_vegetation, type_of_low_vegetation, u_component_of_wind, v_component_of_wind, vertical_velocity, volumetric_soil_water_layer_1, volumetric_soil_water_layer_2, volumetric_soil_water_layer_3, volumetric_soil_water_layer_4, vorticity, wind_speed
Models like GraphCast and GenCast make global-only predictions. The inference process on these models requires GPU computational capabilities and can obtain 6 hour predictions in less than one hour. The training of these models is extremely expensive as it has to process a global dataset for a period of 40 years.
The complexity and large amount of data required to train these models, combined with the computational limitation of weather agencies like ECMWF makes it inaccessible to make it based on a Transformer-based model. For this reason, the existing models leverage a Graph Neural Network (GNN) architecture. GNNs complexity only increases based on the number of edges in the graph, which given that data has an icosahedral mesh structure, it's a great advantage. GNNs do a well performant job and are specially designed for this kind of data, but they miss some of the advantages of a Transformer, like the ability to capture long-range dependencies and global attention capabilities.
The disadvantage of these models is that data is centralized by an European agency and doesn't leave space for adaptability of this model to differences in specific regions, or the use of data captured by local weather stations.
Across the globe, weather predictions are managed by local agencies which, in many cases, aren't interconnected and work on an isolated basis. For these cases, it's important to provide a low-cost option that only focuses on the geographic areas covered by these institutions.
The goal of this project is to develop a localized model that is able to make local weather predictions with a high degree of accuracy and speed, while providing a low-cost framework that would enable these agencies to train their own models with their own data.
For this kind of datasets, a GNN may not be the best option, as the benefits of the icosahedral mesh input representation are not fully exploited.
The Transformer architecture, originally proposed for natural language processing, has demonstrated strong performance in time series prediction tasks due to its self-attention mechanism and ability to model complex temporal dependencies.
We plan to train Transformers on specific geolocations to provide a more precise and less computationally expensive prediction model to weather agencies that work at the state or county level.
The model aims to forecast up to 3 days of weather variables such as temperature, humidity and precipitation based on the ERA5 dataset containing weather data from 1979 to 2023 and with an input of the last 10 days of weather information.
The proposed model not only contributes to improving short-term local weather prediction but also evaluates the effectiveness of Transformers in real-world multi-variable time series forecasting scenarios, offering valuable insights for future research and industrial applications.
First we trained and evaluated a basic Graph Neural Network to understand its behavior. This test generates a synthetic dataset with a 5x5 graph, where each node provides temperature, humidity, and precipitation data 15.
The model follows this specification:
- Input Graph Convolutional Layer: ReLU activation, Convolutional filters from 3 input to 8 hidden features
- Output Graph Convolutional Layer: ReLU activation, Convolutional filters from 8 input to 1 output feature
- Mean Squared Error (MSE) loss function calculated again the local temperature
- Adam optimizer with a learning rate of 0.01
- Trained for 100 epochs with a batch size of 10
This model predicts the temperature. The accuracy isn't good given that we worked with a synthetic dataset and no patterns would be detected.
We run an inference of the Graphcast model on the WeatherBench2 dataset. The results are accurate as expected 16.
We trained a relatively small Transformer model with a reduced dataset to understand its behavior.
Training Dataset:
- ERA5 Reanalysis Dataset
- Latitude and Longitude limited to the United States
- Time period: January, February and March 2024 (First three days of the month)
- Granularity 6 hours and 0.25 degrees of latitude and longitude
- Features:
- t2m: 2-meter temperature
- u10: 10-meter u-component of wind
- v10: 10-meter v-component of wind
- tp: Total precipitation
Model Specification:
- Linear input layer
- Transformer Encoder Layer: 2 attention heads, 32 hidden features
- Linear output layer
- Mean Squared Error (MSE) loss function
- Adam optimizer with a learning rate of 0.01
- Trained for 10 epochs with a batch size of 10
- Output prediction is 2-meter temperature
We compared the next 6 hours prediction over 10 datapoints and obtained the following results:
| Predicted_t2m | Actual_t2m |
|---|---|
| 274.806458 | 291.328003 |
| 285.862549 | 291.331909 |
| 274.311859 | 291.960205 |
| 274.719574 | 278.831909 |
| 274.827332 | 279.126465 |
| 276.147614 | 277.822144 |
| 274.903656 | 263.065918 |
| 280.033905 | 295.828003 |
| 275.402252 | 265.808472 |
| 277.969025 | 292.344604 |
Temperature is expressed in Kelvin degrees. The error in the predictions is noticeable, sometimes around 15 Kelvin/Celsius degrees. Despite this we can see that the model was able to make sense of the data with a very small dataset, reduced architecture and a short training time.
We'll build a more complex model to be trained in Zaratan HPC. We will focus on the DMV area in the training.
Despite focusing on a small region like the DMV area, the input latitudes and longitude ranges have to be expanded to cover a wider area. This is because of the nature of weather predictions, which are based on the surrounding areas due to incoming storms, winds and air currents.
c = cdsapi.Client()
c.retrieve(
'reanalysis-era5-single-levels',
{
'product_type': 'reanalysis',
'variable': [
'2m_temperature', '10m_u_component_of_wind', '10m_v_component_of_wind',
'total_precipitation'
],
'year': '2024',
'month': ['01', '02', '03'],
'day': ['01', '02', '03'],
'time': [
'00:00', '06:00', '12:00', '18:00'
],
'format': 'netcdf',
'area': [50, -130, 24, -66],
},
'download.nc'
)
ds_instant = xr.open_dataset('data_stream-oper_stepType-instant.nc')
ds_accum = xr.open_dataset('data_stream-oper_stepType-accum.nc')
class WeatherTransformer(nn.Module):
def __init__(self, feature_dim, d_model=32, nhead=2, num_layers=1):
super().__init__()
self.input_fc = nn.Linear(feature_dim, d_model)
encoder_layer = nn.TransformerEncoderLayer(d_model=d_model, nhead=nhead, batch_first=True)
self.transformer_encoder = nn.TransformerEncoder(encoder_layer, num_layers=num_layers)
self.output_fc = nn.Linear(d_model, 1)
def forward(self, x):
x = self.input_fc(x)
x = x.unsqueeze(1)
x = self.transformer_encoder(x)
x = x.squeeze(1)
out = self.output_fc(x)
return out
- Bauer, P., Thorpe, A. & Brunet, G. The quiet revolution of numerical weather prediction. Nature 525, 47–55 (2015).
- ECMWF. IFS Documentation CY46R1. Part V: Ensemble Prediction System (ECMWF, 2019).
- Lam, R. et al. Learning skillful medium-range global weather forecasting. Science 382, 1416–1421 (2023).
- Price, I., Sanchez-Gonzalez, A., Alet, F. et al. Probabilistic weather forecasting with machine learning. Nature 637, 84–90 (2024).
- Hersbach, H. et al. The ERA5 global reanalysis. Q. J. R. Meteorol. Soc. 146, 1999–2049 (2020).
- WeatherBench2 Documentation. https://weatherbench2.readthedocs.io/ (accessed 2024).
- National Weather Service. Numerical Weather Prediction (U.S. Department of Commerce, National Oceanic and Atmospheric Administration, 2012).
- Windy.app. ECMWF-ENS. The most accurate ensemble weather forecast model. Windy.app Blog. https://windy.app/blog/ecmwf-ens-weather-forecast-model.html (2023).
- Bassi, M. Google reveals new AI model that predicts weather better than the best traditional forecasts. Smithsonian Magazine. https://www.smithsonianmag.com/smart-news/google-reveals-new-ai-model-that-predicts-weather-better-than-the-best-traditional-forecasts-180985608/ (2024).
- Lam, R. et al. Learning skillful medium-range global weather forecasting. Science 382, 1416–1421 (2023).
- Lam, R. et al. GraphCast: AI model for faster and more accurate global weather forecasting. Google DeepMind (2023). at https://deepmind.google/discover/blog/graphcast-ai-model-for-faster-and-more-accurate-global-weather-forecasting/
- Kurth, T. et al. FourCastNet: Accelerating Global High-Resolution Weather Forecasting using Adaptive Fourier Neural Operators. in Proceedings of the Platform for Advanced Scientific Computing Conference 1–11 (ACM, 2023).
- Bonev, B. et al. Spherical Fourier Neural Operators: Learning Stable Dynamics on the Sphere. in Proceedings of the 40th International Conference on Machine Learning 202, 2806–2823 (PMLR, 2023).
- Bi, K. et al. Accurate medium-range global weather forecasting with 3D neural networks. Nature 619, 533–538 (2023).
- Prasanth Vemula. AI for Weather Forecasting: How DeepMind’s Graphcast is Setting New Standards. https://medium.com/@prasanthvemula1729/ai-for-weather-forecasting-how-deepminds-graphcast-is-setting-new-standards-0f3ec962e650 (2024).
- DeepMind. GraphCast: AI model for faster and more accurate global weather forecasting. https://github.com/google-deepmind/graphcast (2023).
https://towardsdatascience.com/graphcast-how-to-get-things-done-f2fd5630c5fb/ https://github.com/abhinavyesss/graphcast-predict