MSML612 - Designing a Transformer-Based Time Series Forecasting Model for Multi-Step, Multi-Variable Weather Prediction

Introduction

Accurate and timely weather forecasting plays a vital role in multiple sectors, including agriculture, transportation, public safety, and energy management.

Traditionally, weather forecasting has relied on numerical weather prediction (NWP)¹. The leading operational medium-range weather forecast globally is the Ensemble Prediction System (ENS), developed by the European Centre for Medium-Range Weather Forecasts (ECMWF)². This system uses NWP and generates multiple forecast scenarios by introducing small variations to the model's initial conditions.

Machine learning based methods have been developed in recent years. The Graphcast ³ and Gencast ⁴ models designed by Google Deepmind were the first ones to consistently outperform the numerical models.

Gencast, the latest Google Deepmind prediction model is based on Graph Neural Networks (GNNs) trained with 18 measurements at 13 vertical pressure levels for a global set of locations with 0.25 degrees of precision in latitude and longitude.

The models are trained on the ERA5 reanalysis dataset provided by ECMWF ⁵ which reconstructs the past weather data point at a global level from 1979 to 2018.

Numerical Weather Prediction

Numerical weather prediction (NWP) involves forecasting the weather by using mathematical equations that represent fluid dynamics. These equations are calculated by computers to generate weather forecasts. Forecasters often analyze and compare multiple NWP outputs, such as regional versus global models or models from different forecasting centers. To improve reliability, ensemble techniques combine results from several model runs, each with slightly varied initial conditions or model settings, using statistical and graphical approaches ⁷.

NWP forecast is critically dependent on two factors: the initial conditions and the boundary conditions. Initial conditions define the starting point of the simulation which is determined through a process called data assimilation, where observations from satellites, and weather stations are integrated into the model's framework ⁷.

ECMWF's Ensemble Prediction System

The Ensemble Prediction System (ENS) was created by the European Centre for Medium-Range Weather Forecasts (ECMWF) and is recognized as one of the two leading global weather forecasting models, also considered the most reliable in the world. The ENS uses multiple runs of the ECMWF model, each with slightly different starting conditions, to generate a range of possible weather outcomes. Specifically, the monthly forecasting system is executed 51 times: one "control" forecast uses the standard initial conditions from ECMWF's ocean and atmospheric analyses, while the other 50 runs use perturbed initial states. This approach allows the ENS to capture and represent a broad spectrum of potential future weather scenarios ⁸.

Some characteristics of this mode:

Resolutions up to tens of km
The forecast depth of the weather model is 6 days
The forecast step is 3 hours
The update frequency is 4 times a day

The principal weakness of NWP is its immense computational cost. The complexity of the models requires massive supercomputers with millions of processors to perform the necessary calculations within operational time constraints. Global models, which are necessary for medium-range forecasting, typically operate at resolutions too coarse to capture fine-scale local weather phenomena. This high barrier to entry makes state-of-the-art NWP inaccessible for many smaller, local weather agencies that operate at the state or county level ⁷.

Machine Learning Based Forecasting Models

With the advent of AI technologies, the AI-based weather prediction field has seen a rapid growth. AI models don't follow fundamental physical laws, but they learn from vast amounts of historical weather data. The information is obtained from decades of reanalysis data, such as the ERA5 dataset from ECMWF, which reconstructs the past global weather state by assimilating historical observations ⁹.

GraphCast

Developed by Google Deepmind, GraphCast is a deep learning model that leverages a Graph Neural Network (GNN) architecture. GraphCast operates autoregressively: given the state of the weather at the current time t and six hours prior t−6h, it predicts the weather state six hours later t+6h. This process is then rolled forward iteratively to produce a 10-day forecast. The model is trained on the ERA5 reanalysis dataset provided by ECMWF ⁵ and is based on Graph Neural Networks (GNNs) ¹⁰.

GraphCast predicted weather conditions more accurately than the High Resolution Forecast (HRES), produced by the European Centre for Medium-Range Weather Forecasts (ECMWF). It was able to make this 10-day forecast in less than a minute on a single Google TPU v4 machine ¹¹.

This model doesn't provide uncertainty estimates when compared to NWP models.

GenCast

GenCast builds on DeepMind’s previous AI weather models such as GraphCast. The latest one is a deterministic model which only gives the best weather prediction. GenCast, on the other hand, generates 50 or more forecasts, by introducing small variations (Using a diffusion process) to the model's initial conditions and enables the model to express uncertainty.

This model represents the Earth’s surface with evenly spaced nodes, with an icosahedral mesh, eliminating distortions near the poles. It's able to predict the weather conditions for a granularity of 0.25 degrees in latitude and longitude ⁴.

FourCastNetv2

FourCastNetv2 is a leading-edge machine learning model designed for weather prediction. It utilizes Fourier Neural Operators (FNOs), which excel at efficiently capturing long-range relationships in spatio-temporal data by learning global convolutions.

FNOs rely on the discrete Fourier transform (DFT), however, DFTs cause visual and spectral artifacts as well as pronounced dissipation when learning operators in spherical coordinates since they incorrectly assume a flat geometry. For this reason, the authors of FourCastNetv2 developed a new operator called the Spherical Fourier Transform (SFT) ¹³.

This technique is focused on deterministic weather forecasting ¹².

Pangu-Weather

This model has been developed by Huawei and it's based on a Transformer-based architecture. The team behind this new model designed a three-dimensional (3D) Earth-specific transformer (3DEST) architecture where height is just another dimension. As other models, it's able to predict the weather conditions for a granularity of 0.25 degrees in latitude and longitude in a deterministic way ¹⁴.

A Localized Weather Forecasting Model

The current state of weather forecasting is defined by a trade-off between computational cost and fidelity. NWP models are robust but some times computationally prohibitive and often too coarse for precise local forecasting. Global AI models provide incredible speed and accuracy at scale but may lack the resolution required for specific local contexts.

Data Sources

ERA5 Reanalysis Dataset

The ERA5 Reanalysis dataset is provided by ECMWF. The parameters included in this dataset and their meaning can be found here: ECMWF - Copernicus Knowledge Base - ERA5: data documentation

CDS API

We retrieved the code from the ECMWF Climate Data Store using the API provided by this organization to evaluate the dataset. This service provides more than 100 fields with a time granularity of 1 hour and a spatial granularity of 0.25 degrees of latitude and longitude.

WeatherBench2

WeatherBench2 is a benchmark dataset for weather forecasting models. It's available at: WeatherBench2

The dataset is based on the ERA5 Reanalysis dataset provided by ECMWF and has been pre-processed for easier consumption. Measurements are normalized (E.g.: Temperatures are transformed from Kelvin to Celsius). Redundant variables are removed. The time granularity is limited to 6 hours.

This is the dataset leveraged by Google Deepmind’s Graphcast and Gencast and will be used for our demonstrations.

The WeatherBench2 ERA5 data set has the following structure:

Dimensions
- Time: Every 6 hours
- Latitude: 0.25 degrees (712 values)
- Longitude: 0.25 degrees (1440 values)
- Level: 13 levels
62 Predictors
- 10m_u_component_of_wind, 10m_v_component_of_wind, 10m_wind_speed, 2m_dewpoint_temperature, 2m_temperature, mean_sea_level_pressure, mean_surface_latent_heat_flux, mean_surface_net_long_wave_radiation_flux, mean_surface_net_short_wave_radiation_flux, mean_surface_sensible_heat_flux, mean_top_downward_short_wave_radiation_flux, mean_top_net_long_wave_radiation_flux, mean_top_net_short_wave_radiation_flux, mean_vertically_integrated_moisture_divergence, potential_vorticity, relative_humidity, sea_ice_cover, sea_surface_temperature, slope_of_sub_gridscale_orography, snow_depth, soil_type, specific_humidity, standard_deviation_of_filtered_subgrid_orography, standard_deviation_of_orography, surface_pressure, temperature, total_cloud_cover, total_column_vapor, total_column_water, total_column_water_vapour, total_precipitation_12hr, total_precipitation_24hr, total_precipitation_6hr, type_of_high_vegetation, type_of_low_vegetation, u_component_of_wind, v_component_of_wind, vertical_velocity, volumetric_soil_water_layer_1, volumetric_soil_water_layer_2, volumetric_soil_water_layer_3, volumetric_soil_water_layer_4, vorticity, wind_speed

Proposal

Models like GraphCast and GenCast make global-only predictions. The inference process on these models requires GPU computational capabilities and can obtain 6 hour predictions in less than one hour. The training of these models is extremely expensive as it has to process a global dataset for a period of 40 years.

The complexity and large amount of data required to train these models, combined with the computational limitation of weather agencies like ECMWF makes it inaccessible to make it based on a Transformer-based model. For this reason, the existing models leverage a Graph Neural Network (GNN) architecture. GNNs complexity only increases based on the number of edges in the graph, which given that data has an icosahedral mesh structure, it's a great advantage. GNNs do a well performant job and are specially designed for this kind of data, but they miss some of the advantages of a Transformer, like the ability to capture long-range dependencies and global attention capabilities.

The case of local weather forecasting

The disadvantage of these models is that data is centralized by an European agency and doesn't leave space for adaptability of this model to differences in specific regions, or the use of data captured by local weather stations.

Across the globe, weather predictions are managed by local agencies which, in many cases, aren't interconnected and work on an isolated basis. For these cases, it's important to provide a low-cost option that only focuses on the geographic areas covered by these institutions.

The goal of this project is to develop a localized model that is able to make local weather predictions with a high degree of accuracy and speed, while providing a low-cost framework that would enable these agencies to train their own models with their own data.

For this kind of datasets, a GNN may not be the best option, as the benefits of the icosahedral mesh input representation are not fully exploited.

The Transformer architecture, originally proposed for natural language processing, has demonstrated strong performance in time series prediction tasks due to its self-attention mechanism and ability to model complex temporal dependencies.

We plan to train Transformers on specific geolocations to provide a more precise and less computationally expensive prediction model to weather agencies that work at the state or county level.

The model aims to forecast up to 3 days of weather variables such as temperature, humidity and precipitation based on the ERA5 dataset containing weather data from 1979 to 2023 and with an input of the last 10 days of weather information.

The proposed model not only contributes to improving short-term local weather prediction but also evaluates the effectiveness of Transformers in real-world multi-variable time series forecasting scenarios, offering valuable insights for future research and industrial applications.

Evaluate State of the Art Models

Basic Graph Neural Network

First we trained and evaluated a basic Graph Neural Network to understand its behavior. This test generates a synthetic dataset with a 5x5 graph, where each node provides temperature, humidity, and precipitation data ¹⁵.

The model follows this specification:

Input Graph Convolutional Layer: ReLU activation, Convolutional filters from 3 input to 8 hidden features
Output Graph Convolutional Layer: ReLU activation, Convolutional filters from 8 input to 1 output feature
Mean Squared Error (MSE) loss function calculated again the local temperature
Adam optimizer with a learning rate of 0.01
Trained for 100 epochs with a batch size of 10

This model predicts the temperature. The accuracy isn't good given that we worked with a synthetic dataset and no patterns would be detected.

Graphcast

We run an inference of the Graphcast model on the WeatherBench2 dataset. The results are accurate as expected ¹⁶.

Implementation

Minimal Transformer Model

We trained a relatively small Transformer model with a reduced dataset to understand its behavior.

Training Dataset:

ERA5 Reanalysis Dataset
Latitude and Longitude limited to the United States
Time period: January, February and March 2024 (First three days of the month)
Granularity 6 hours and 0.25 degrees of latitude and longitude
Features:
- t2m: 2-meter temperature
- u10: 10-meter u-component of wind
- v10: 10-meter v-component of wind
- tp: Total precipitation

Model Specification:

Linear input layer
Transformer Encoder Layer: 2 attention heads, 32 hidden features
Linear output layer
Mean Squared Error (MSE) loss function
Adam optimizer with a learning rate of 0.01
Trained for 10 epochs with a batch size of 10
Output prediction is 2-meter temperature

We compared the next 6 hours prediction over 10 datapoints and obtained the following results:

Predicted_t2m	Actual_t2m
274.806458	291.328003
285.862549	291.331909
274.311859	291.960205
274.719574	278.831909
274.827332	279.126465
276.147614	277.822144
274.903656	263.065918
280.033905	295.828003
275.402252	265.808472
277.969025	292.344604

Temperature is expressed in Kelvin degrees. The error in the predictions is noticeable, sometimes around 15 Kelvin/Celsius degrees. Despite this we can see that the model was able to make sense of the data with a very small dataset, reduced architecture and a short training time.

Next Steps

We'll build a more complex model to be trained in Zaratan HPC. We will focus on the DMV area in the training.

Despite focusing on a small region like the DMV area, the input latitudes and longitude ranges have to be expanded to cover a wider area. This is because of the nature of weather predictions, which are based on the surrounding areas due to incoming storms, winds and air currents.

Appendix

Data Download

c = cdsapi.Client()

c.retrieve(
    'reanalysis-era5-single-levels',
    {
        'product_type': 'reanalysis',
        'variable': [
            '2m_temperature', '10m_u_component_of_wind', '10m_v_component_of_wind',
            'total_precipitation'
        ],
        'year': '2024',
        'month': ['01', '02', '03'],
        'day': ['01', '02', '03'],
        'time': [
            '00:00', '06:00', '12:00', '18:00'
        ],
        'format': 'netcdf',
        'area': [50, -130, 24, -66],
    },
    'download.nc'
)

Data Extraction

ds_instant = xr.open_dataset('data_stream-oper_stepType-instant.nc')
ds_accum = xr.open_dataset('data_stream-oper_stepType-accum.nc')

Basic Transformer Model

class WeatherTransformer(nn.Module):
    def __init__(self, feature_dim, d_model=32, nhead=2, num_layers=1):
        super().__init__()
        self.input_fc = nn.Linear(feature_dim, d_model)
        encoder_layer = nn.TransformerEncoderLayer(d_model=d_model, nhead=nhead, batch_first=True)
        self.transformer_encoder = nn.TransformerEncoder(encoder_layer, num_layers=num_layers)
        self.output_fc = nn.Linear(d_model, 1)

    def forward(self, x):
        x = self.input_fc(x)
        x = x.unsqueeze(1)
        x = self.transformer_encoder(x)
        x = x.squeeze(1)
        out = self.output_fc(x)
        return out

References

Bauer, P., Thorpe, A. & Brunet, G. The quiet revolution of numerical weather prediction. Nature 525, 47–55 (2015).
ECMWF. IFS Documentation CY46R1. Part V: Ensemble Prediction System (ECMWF, 2019).
Lam, R. et al. Learning skillful medium-range global weather forecasting. Science 382, 1416–1421 (2023).
Price, I., Sanchez-Gonzalez, A., Alet, F. et al. Probabilistic weather forecasting with machine learning. Nature 637, 84–90 (2024).
Hersbach, H. et al. The ERA5 global reanalysis. Q. J. R. Meteorol. Soc. 146, 1999–2049 (2020).
WeatherBench2 Documentation. https://weatherbench2.readthedocs.io/ (accessed 2024).
National Weather Service. Numerical Weather Prediction (U.S. Department of Commerce, National Oceanic and Atmospheric Administration, 2012).
Windy.app. ECMWF-ENS. The most accurate ensemble weather forecast model. Windy.app Blog. https://windy.app/blog/ecmwf-ens-weather-forecast-model.html (2023).
Bassi, M. Google reveals new AI model that predicts weather better than the best traditional forecasts. Smithsonian Magazine. https://www.smithsonianmag.com/smart-news/google-reveals-new-ai-model-that-predicts-weather-better-than-the-best-traditional-forecasts-180985608/ (2024).
Lam, R. et al. Learning skillful medium-range global weather forecasting. Science 382, 1416–1421 (2023).
Lam, R. et al. GraphCast: AI model for faster and more accurate global weather forecasting. Google DeepMind (2023). at https://deepmind.google/discover/blog/graphcast-ai-model-for-faster-and-more-accurate-global-weather-forecasting/
Kurth, T. et al. FourCastNet: Accelerating Global High-Resolution Weather Forecasting using Adaptive Fourier Neural Operators. in Proceedings of the Platform for Advanced Scientific Computing Conference 1–11 (ACM, 2023).
Bonev, B. et al. Spherical Fourier Neural Operators: Learning Stable Dynamics on the Sphere. in Proceedings of the 40th International Conference on Machine Learning 202, 2806–2823 (PMLR, 2023).
Bi, K. et al. Accurate medium-range global weather forecasting with 3D neural networks. Nature 619, 533–538 (2023).
Prasanth Vemula. AI for Weather Forecasting: How DeepMind’s Graphcast is Setting New Standards. https://medium.com/@prasanthvemula1729/ai-for-weather-forecasting-how-deepminds-graphcast-is-setting-new-standards-0f3ec962e650 (2024).
DeepMind. GraphCast: AI model for faster and more accurate global weather forecasting. https://github.com/google-deepmind/graphcast (2023).

Others

https://towardsdatascience.com/graphcast-how-to-get-things-done-f2fd5630c5fb/ https://github.com/abhinavyesss/graphcast-predict

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.vscode		.vscode
artifacts		artifacts
assets		assets
models		models
old		old
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MSML612 - Designing a Transformer-Based Time Series Forecasting Model for Multi-Step, Multi-Variable Weather Prediction

Introduction

Numerical Weather Prediction

ECMWF's Ensemble Prediction System

Machine Learning Based Forecasting Models

GraphCast

GenCast

FourCastNetv2

Pangu-Weather

A Localized Weather Forecasting Model

Data Sources

ERA5 Reanalysis Dataset

CDS API

WeatherBench2

Proposal

The case of local weather forecasting

Evaluate State of the Art Models

Basic Graph Neural Network

Graphcast

Implementation

Minimal Transformer Model

Next Steps

Appendix

Data Download

Data Extraction

Basic Transformer Model

References

Others

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MSML612 - Designing a Transformer-Based Time Series Forecasting Model for Multi-Step, Multi-Variable Weather Prediction

Introduction

Numerical Weather Prediction

ECMWF's Ensemble Prediction System

Machine Learning Based Forecasting Models

GraphCast

GenCast

FourCastNetv2

Pangu-Weather

A Localized Weather Forecasting Model

Data Sources

ERA5 Reanalysis Dataset

CDS API

WeatherBench2

Proposal

The case of local weather forecasting

Evaluate State of the Art Models

Basic Graph Neural Network

Graphcast

Implementation

Minimal Transformer Model

Next Steps

Appendix

Data Download

Data Extraction

Basic Transformer Model

References

Others

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages