An approach to enhance pnetCDF performance in environmental modeling applications

Data intensive simulations are often limited by their I/O (input/output) performance, and "novel" techniques need to be developed in order to overcome this limitation. The software package pnetCDF (parallel network Common Data Form), which works with parallel file systems, was developed t...

Full description

Bibliographic Details
Main Authors: D. C. Wong, C. E. Yang, J. S. Fu, K. Wong, Y. Gao
Format: Article
Language:English
Published: Copernicus Publications 2015-04-01
Series:Geoscientific Model Development
Online Access:http://www.geosci-model-dev.net/8/1033/2015/gmd-8-1033-2015.pdf
_version_ 1829484799860932608
author D. C. Wong
C. E. Yang
J. S. Fu
K. Wong
Y. Gao
author_facet D. C. Wong
C. E. Yang
J. S. Fu
K. Wong
Y. Gao
author_sort D. C. Wong
collection DOAJ
description Data intensive simulations are often limited by their I/O (input/output) performance, and "novel" techniques need to be developed in order to overcome this limitation. The software package pnetCDF (parallel network Common Data Form), which works with parallel file systems, was developed to address this issue by providing parallel I/O capability. This study examines the performance of an application-level data aggregation approach which performs data aggregation along either row or column dimension of MPI (Message Passing Interface) processes on a spatially decomposed domain, and then applies the pnetCDF parallel I/O paradigm. The test was done with three different domain sizes which represent small, moderately large, and large data domains, using a small-scale Community Multiscale Air Quality model (CMAQ) mock-up code. The examination includes comparing I/O performance with traditional serial I/O technique, straight application of pnetCDF, and the data aggregation along row and column dimension before applying pnetCDF. After the comparison, "optimal" I/O configurations of this application-level data aggregation approach were quantified. Data aggregation along the row dimension (pnetCDFcr) works better than along the column dimension (pnetCDFcc) although it may perform slightly worse than the straight pnetCDF method with a small number of processors. When the number of processors becomes larger, pnetCDFcr outperforms pnetCDF significantly. If the number of processors keeps increasing, pnetCDF reaches a point where the performance is even worse than the serial I/O technique. This new technique has also been tested for a real application where it performs two times better than the straight pnetCDF paradigm.
first_indexed 2024-12-14T22:34:19Z
format Article
id doaj.art-dd87c5936aa149dba73a6c5f9ffba1d9
institution Directory Open Access Journal
issn 1991-959X
1991-9603
language English
last_indexed 2024-12-14T22:34:19Z
publishDate 2015-04-01
publisher Copernicus Publications
record_format Article
series Geoscientific Model Development
spelling doaj.art-dd87c5936aa149dba73a6c5f9ffba1d92022-12-21T22:45:12ZengCopernicus PublicationsGeoscientific Model Development1991-959X1991-96032015-04-01841033104610.5194/gmd-8-1033-2015An approach to enhance pnetCDF performance in environmental modeling applicationsD. C. Wong0C. E. Yang1J. S. Fu2K. Wong3Y. Gao4U.S. Environmental Protection Agency, Research Triangle Park, NC, USAUniversity of Tennessee, Knoxville, TN, USAUniversity of Tennessee, Knoxville, TN, USAUniversity of Tennessee, Knoxville, TN, USAUniversity of Tennessee, Knoxville, TN, USAData intensive simulations are often limited by their I/O (input/output) performance, and "novel" techniques need to be developed in order to overcome this limitation. The software package pnetCDF (parallel network Common Data Form), which works with parallel file systems, was developed to address this issue by providing parallel I/O capability. This study examines the performance of an application-level data aggregation approach which performs data aggregation along either row or column dimension of MPI (Message Passing Interface) processes on a spatially decomposed domain, and then applies the pnetCDF parallel I/O paradigm. The test was done with three different domain sizes which represent small, moderately large, and large data domains, using a small-scale Community Multiscale Air Quality model (CMAQ) mock-up code. The examination includes comparing I/O performance with traditional serial I/O technique, straight application of pnetCDF, and the data aggregation along row and column dimension before applying pnetCDF. After the comparison, "optimal" I/O configurations of this application-level data aggregation approach were quantified. Data aggregation along the row dimension (pnetCDFcr) works better than along the column dimension (pnetCDFcc) although it may perform slightly worse than the straight pnetCDF method with a small number of processors. When the number of processors becomes larger, pnetCDFcr outperforms pnetCDF significantly. If the number of processors keeps increasing, pnetCDF reaches a point where the performance is even worse than the serial I/O technique. This new technique has also been tested for a real application where it performs two times better than the straight pnetCDF paradigm.http://www.geosci-model-dev.net/8/1033/2015/gmd-8-1033-2015.pdf
spellingShingle D. C. Wong
C. E. Yang
J. S. Fu
K. Wong
Y. Gao
An approach to enhance pnetCDF performance in environmental modeling applications
Geoscientific Model Development
title An approach to enhance pnetCDF performance in environmental modeling applications
title_full An approach to enhance pnetCDF performance in environmental modeling applications
title_fullStr An approach to enhance pnetCDF performance in environmental modeling applications
title_full_unstemmed An approach to enhance pnetCDF performance in environmental modeling applications
title_short An approach to enhance pnetCDF performance in environmental modeling applications
title_sort approach to enhance pnetcdf performance in environmental modeling applications
url http://www.geosci-model-dev.net/8/1033/2015/gmd-8-1033-2015.pdf
work_keys_str_mv AT dcwong anapproachtoenhancepnetcdfperformanceinenvironmentalmodelingapplications
AT ceyang anapproachtoenhancepnetcdfperformanceinenvironmentalmodelingapplications
AT jsfu anapproachtoenhancepnetcdfperformanceinenvironmentalmodelingapplications
AT kwong anapproachtoenhancepnetcdfperformanceinenvironmentalmodelingapplications
AT ygao anapproachtoenhancepnetcdfperformanceinenvironmentalmodelingapplications
AT dcwong approachtoenhancepnetcdfperformanceinenvironmentalmodelingapplications
AT ceyang approachtoenhancepnetcdfperformanceinenvironmentalmodelingapplications
AT jsfu approachtoenhancepnetcdfperformanceinenvironmentalmodelingapplications
AT kwong approachtoenhancepnetcdfperformanceinenvironmentalmodelingapplications
AT ygao approachtoenhancepnetcdfperformanceinenvironmentalmodelingapplications