Efficient performance of the Met Office Unified Model v8.2 on Intel Xeon partially used nodes
The atmospheric Unified Model (UM) developed at the UK Met Office is used for weather and climate prediction by forecast teams at a number of international meteorological centres and research institutes on a wide variety of hardware and software environments. Over its 25 year history the UM sources...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Copernicus Publications
2015-03-01
|
Series: | Geoscientific Model Development |
Online Access: | http://www.geosci-model-dev.net/8/769/2015/gmd-8-769-2015.pdf |
Summary: | The atmospheric Unified Model (UM) developed at the UK Met Office is used for
weather and climate prediction by forecast teams at a number of international
meteorological centres and research institutes on a wide variety of hardware
and software environments. Over its 25 year history the UM sources have been
optimised for better application performance on a number of High Performance
Computing (HPC) systems including NEC SX vector architecture systems and
recently the IBM Power6/Power7 platforms. Understanding the influence of the
compiler flags, Message Passing Interface (MPI) libraries and run
configurations is crucial to achieving the shortest elapsed times for a UM
application on any particular HPC system. These aspects are very important
for applications that must run within operational time frames. Driving the
current study is the HPC industry trend since 1980 for processor arithmetic
performance to increase at a faster rate than memory bandwidth. This gap has
been growing especially fast for multicore processors in the past 10 years
and it can have significant implication for the performance and performance
scaling of memory bandwidth intensive applications, such as the UM. Analysis
of partially used nodes on Intel Xeon clusters is provided in this paper for
short- and medium-range weather forecasting systems using global and
limited-area configurations. It is shown that on the Intel Xeon-based
clusters the fastest elapsed times and the most efficient system usage can be
achieved using partially committed nodes. |
---|---|
ISSN: | 1991-959X 1991-9603 |