Efficient performance of the Met Office Unified Model v8.2 on Intel Xeon partially used nodes

The atmospheric Unified Model (UM) developed at the UK Met Office is used for weather and climate prediction by forecast teams at a number of international meteorological centres and research institutes on a wide variety of hardware and software environments. Over its 25 year history the UM sources...

Full description

Bibliographic Details
Main Authors: I. Bermous, P. Steinle
Format: Article
Language:English
Published: Copernicus Publications 2015-03-01
Series:Geoscientific Model Development
Online Access:http://www.geosci-model-dev.net/8/769/2015/gmd-8-769-2015.pdf
_version_ 1818603371857182720
author I. Bermous
P. Steinle
author_facet I. Bermous
P. Steinle
author_sort I. Bermous
collection DOAJ
description The atmospheric Unified Model (UM) developed at the UK Met Office is used for weather and climate prediction by forecast teams at a number of international meteorological centres and research institutes on a wide variety of hardware and software environments. Over its 25 year history the UM sources have been optimised for better application performance on a number of High Performance Computing (HPC) systems including NEC SX vector architecture systems and recently the IBM Power6/Power7 platforms. Understanding the influence of the compiler flags, Message Passing Interface (MPI) libraries and run configurations is crucial to achieving the shortest elapsed times for a UM application on any particular HPC system. These aspects are very important for applications that must run within operational time frames. Driving the current study is the HPC industry trend since 1980 for processor arithmetic performance to increase at a faster rate than memory bandwidth. This gap has been growing especially fast for multicore processors in the past 10 years and it can have significant implication for the performance and performance scaling of memory bandwidth intensive applications, such as the UM. Analysis of partially used nodes on Intel Xeon clusters is provided in this paper for short- and medium-range weather forecasting systems using global and limited-area configurations. It is shown that on the Intel Xeon-based clusters the fastest elapsed times and the most efficient system usage can be achieved using partially committed nodes.
first_indexed 2024-12-16T13:22:07Z
format Article
id doaj.art-16245789eae945ad9ac82f2b97cbaf13
institution Directory Open Access Journal
issn 1991-959X
1991-9603
language English
last_indexed 2024-12-16T13:22:07Z
publishDate 2015-03-01
publisher Copernicus Publications
record_format Article
series Geoscientific Model Development
spelling doaj.art-16245789eae945ad9ac82f2b97cbaf132022-12-21T22:30:19ZengCopernicus PublicationsGeoscientific Model Development1991-959X1991-96032015-03-018376977910.5194/gmd-8-769-2015Efficient performance of the Met Office Unified Model v8.2 on Intel Xeon partially used nodesI. Bermous0P. Steinle1Centre for Australian Weather and Climate Research, the Australian Bureau of Meteorology, Melbourne, AustraliaCentre for Australian Weather and Climate Research, the Australian Bureau of Meteorology, Melbourne, AustraliaThe atmospheric Unified Model (UM) developed at the UK Met Office is used for weather and climate prediction by forecast teams at a number of international meteorological centres and research institutes on a wide variety of hardware and software environments. Over its 25 year history the UM sources have been optimised for better application performance on a number of High Performance Computing (HPC) systems including NEC SX vector architecture systems and recently the IBM Power6/Power7 platforms. Understanding the influence of the compiler flags, Message Passing Interface (MPI) libraries and run configurations is crucial to achieving the shortest elapsed times for a UM application on any particular HPC system. These aspects are very important for applications that must run within operational time frames. Driving the current study is the HPC industry trend since 1980 for processor arithmetic performance to increase at a faster rate than memory bandwidth. This gap has been growing especially fast for multicore processors in the past 10 years and it can have significant implication for the performance and performance scaling of memory bandwidth intensive applications, such as the UM. Analysis of partially used nodes on Intel Xeon clusters is provided in this paper for short- and medium-range weather forecasting systems using global and limited-area configurations. It is shown that on the Intel Xeon-based clusters the fastest elapsed times and the most efficient system usage can be achieved using partially committed nodes.http://www.geosci-model-dev.net/8/769/2015/gmd-8-769-2015.pdf
spellingShingle I. Bermous
P. Steinle
Efficient performance of the Met Office Unified Model v8.2 on Intel Xeon partially used nodes
Geoscientific Model Development
title Efficient performance of the Met Office Unified Model v8.2 on Intel Xeon partially used nodes
title_full Efficient performance of the Met Office Unified Model v8.2 on Intel Xeon partially used nodes
title_fullStr Efficient performance of the Met Office Unified Model v8.2 on Intel Xeon partially used nodes
title_full_unstemmed Efficient performance of the Met Office Unified Model v8.2 on Intel Xeon partially used nodes
title_short Efficient performance of the Met Office Unified Model v8.2 on Intel Xeon partially used nodes
title_sort efficient performance of the met office unified model v8 2 on intel xeon partially used nodes
url http://www.geosci-model-dev.net/8/769/2015/gmd-8-769-2015.pdf
work_keys_str_mv AT ibermous efficientperformanceofthemetofficeunifiedmodelv82onintelxeonpartiallyusednodes
AT psteinle efficientperformanceofthemetofficeunifiedmodelv82onintelxeonpartiallyusednodes