MAESTRO large scale data management for better weather prediction

MAESTRO project contributes to higher resolution and reliability in weather prediction through smart large-scale data management

Data placement and movement is performance critical for many applications. For instance, the prediction of extreme weather events such as floods, hurricanes or wildfires in real-time and with higher accuracy is vital to make well-informed decisions about evacuations in order to reduce the economic impact and save human lives.


The MAESTRO project, supported by the research programme on Future and Emerging Technologies of the European Commission, has been set up to tackle one of the most important and difficult problems in High Performance Computing, namely the orchestration of data across multiple levels of the memory and storage hardware as well as the software stack.


Maestro may be of use to applications where data movement and data placement is performance critical. One such application is numerical weather prediction (NWP), in particular global weather prediction. A global forecast model — such as the Integrated Forecast System (IFS) at the European Centre for Medium-Range Weather Forecasts (ECMWF) — simulates the state of the atmosphere four times a day. Each time, the model collects the most recent observations from around the globe and constructs as accurate a description of the global atmosphere as possible within the model. Then a ‘high-resolution forecast’ as well as an ensemble of 51 forecasts simulate the atmosphere’s behaviour and predict its state up to 15 days ahead — sometimes more in the case of long-range and seasonal forecasts. Each of this total of 52 members outputs a snapshot of the predicted global atmosphere at regular intervals. The data is then processed to generate meteorological products, which are in turn disseminated to customers.


Operational forecast needs to be completed within a time-critical window of one hour. Given this strict constraint, outputting the forecast data and using them to create forecast products is the point of highest data contention (bottleneck). The storage is put under immense strain as a result of the high number and large volume of data written to and read from the storage devices simultaneously. A clever and more efficient way to orchestrate data movement and data placement between these components is required to be able to handle the increasing amount of data coming from future forecast models.


Being able to handle more data within the operational window would improve the quality of weather forecasts in two major ways. It would enable running the ensemble forecast at a higher resolution. At present, all of the 51 ensemble members at ECMWF run at a 16km resolution. Running the ensemble forecast at a 9km resolution, for example, would greatly enhance the accuracy of the ensemble forecast. Another improvement would be to increase the number of ensemble members. Analysing the entire ensemble of forecasts provides information about how likely that a future weather event will occur. More ensemble members may thus increase the reliability of forecasting such events.


Higher accuracy and reliability are vital in preparation for extreme weather events, such as floods, hurricanes or wildfires. The sooner and more reliably decisions about evacuations can be made, the smaller the economic and human costs will be.


Photo “Simulated infrared image for water vapour in the global atmosphere for February 28, 2020 at 18 UTC” is courtesy of ECMWF