Proposal

Main paper can be found here.

ML based weather simulator
compares against HRES from ECMWF
10-day forecasts at 6-hour intervals, in under 60s on a TPUv4
autoregressive model
uses GNN as its underlying model (in an encode-process-decode config), with a form of multi-mesh-graph of the earth

Summary

ECMWF has 2 parts: data assimilation and forecasting (HRES and ENS). This paper focuses on using ML-based techniques for the latter part.
HRES generates 0.1 degree resolution 10-day forecasts whereas this paper limits itself to 0.25 degrees.
NWP (Numerical Weather Prediction) methods scale well with compute, whereas ML based methods scale well with increasing amount of data.
Dataset is ERA5 from ECMWF
looks at 37 different vertical pressure levels (instead of altitude)
target variables
- 5 surface variables
- 6 atmospheric variables at 37 different vertical pressure levels
- total of $$5 + (37 x 6) = 227$$ variables
at any given point in time, these variables could change over the 1038240 locations of the grid of the earth
- $$(\frac{180}{0.25} + 1) (\frac{360}{0.25})$$
thus, there are a total of $$1038240 x 227 = 235680480$$ inputs
internally, however, these locations are represented with a multi-mesh structure which has homogeneous spatial resolution over the globe in the form of icosahedrons
- this enables long-range interactions with just a few message-passing steps
- in here, coarse mesh nodes are a subset of finer mesh nodes
dataset split
- training - 1979-2015
- validation - 2016-2017
- test - 2018
weather prediction strategy
- $$X'^{t+1} = GraphCast(X^t, X^{t-1})$$
- for predicting multiple steps ahead in the future, this equation is iteratively applied in an autoregressive fashion
encoder
- maps the input data into the learned features on the multi-mesh
- uses a GNN arch
- assumes directed edges from the grid points to the multi-mesh nodes
processor
- 16-layer deep GNN to perform message passing on the multi-mesh
decoder
- reversal of encoder
- only predicts the change in the next timestep (NOT the value itself)
training
- minizing objective function over 12 timesteps (= 3 days) against ERA5
- gradient is computed by backprop through the entire autoregressive sequence
- can potentially be retrained/finetuned regularly based on recent weather data
- training took 3 weeks on 32x TPUv4, data parallel, gradient-checkpointing and using low-precision arithmetic