The impact of altering emission data precision on compression efficiency and accuracy of simulations of the community multiscale air quality model

<p>The Community Multiscale Air Quality (CMAQ) model has been a vital tool for air quality research and management at the United States Environmental Protection Agency (US EPA) and at government environmental agencies and academic institutions worldwide. The CMAQ model requires a significant a...

Full description

Bibliographic Details
Main Authors: M. S. Walters, D. C. Wong
Format: Article
Language:English
Published: Copernicus Publications 2023-02-01
Series:Geoscientific Model Development
Online Access:https://gmd.copernicus.org/articles/16/1179/2023/gmd-16-1179-2023.pdf
_version_ 1828012070341181440
author M. S. Walters
M. S. Walters
D. C. Wong
author_facet M. S. Walters
M. S. Walters
D. C. Wong
author_sort M. S. Walters
collection DOAJ
description <p>The Community Multiscale Air Quality (CMAQ) model has been a vital tool for air quality research and management at the United States Environmental Protection Agency (US EPA) and at government environmental agencies and academic institutions worldwide. The CMAQ model requires a significant amount of disk space to store and archive input and output files. For example, an annual simulation over the contiguous United States (CONUS) with horizontal grid-cell spacing of 12 km requires 2–3 TB of input data and can produce anywhere from 7–45 TB of output data, depending on modeling configuration and desired post-processing of the output (e.g., for evaluations or graphics). After a simulation is complete, model data are archived for several years, or even decades, to ensure the replicability of conducted research. As a result, careful disk space management is essential to optimize resources and ensure the uninterrupted progress of ongoing research and applications requiring large-scale, air quality modeling. Proper disk-space management may include applying optimal data-compression techniques that are executed on input and output files for all CMAQ simulations. There are several (not limited to) such utilities that compress files using lossless compression, such as GNU Gzip (gzip) and Basic Leucine Zipper Domain (bzip2). A new approach is proposed in this study that reduces the precision of the emission input for air quality modeling to reduce storage requirements (after a lossless compression utility is applied) and accelerate runtime. The new approach is tested using CMAQ simulations and post-processed CMAQ output to examine the impact on the performance of the air quality model. In total, four simulations were conducted, and nine cases were post-processed from direct simulation output to determine disk-space efficiency, runtime efficiency, and model (predictive) accuracy. Three simulations were run with emission input containing only five, four, or three significant digits. To enhance the analysis of disk-space efficiency, the output from the altered precision emission CMAQ simulations were additionally post-processed to contain five, four, or three significant digits. The fourth, and final, simulation was run using the full precision emission files with no alteration. Thus, in total, 13 gridded products (4 simulations and 9 altered precision output cases) were analyzed in this study.</p> <p>Results demonstrate that the altered precision emission files reduced the disk-space footprint by 6 %, 25 %, and 48 % compared to the unaltered emission files when using the bzip2 compression utility for files containing five, four, or three significant digits, respectively. Similarly, the altered output files reduced the required disk space by 19 %, 47 %, and 69 % compared to the unaltered CMAQ output files when using the bzip2 compression utility for files containing five, four, or three significant digits, respectively. For both compressed datasets, bzip2 performed better than gzip, in terms of compression size, by 5 %–27 % for emission data and 15 %–28 % for CMAQ output for files containing five, four, or three significant digits. Additionally, CMAQ runtime was reduced by 2 %–7 % for simulations using emission files with<span id="page1180"/> reduced precision data in a non-dedicated environment. Finally, the model-estimated pollutant concentrations from the four simulations were compared to observed data from the US EPA Air Quality System (AQS) and the Ammonia Monitoring Network (AMoN). Model performance statistics were impacted negligibly. In summary, by reducing the precision of CMAQ emission data to five, four, or three significant digits, the simulation runtime in a non-dedicated environment was slightly reduced, disk-space usage was substantially reduced, and model accuracy remained relatively unchanged compared to the base CMAQ simulation, which suggests that the precision of the emission data could be reduced to more efficiently use computing resources while minimizing the impact on CMAQ simulations.</p>
first_indexed 2024-04-10T09:24:47Z
format Article
id doaj.art-d3560db3766c46158cb521aca9eee911
institution Directory Open Access Journal
issn 1991-959X
1991-9603
language English
last_indexed 2024-04-10T09:24:47Z
publishDate 2023-02-01
publisher Copernicus Publications
record_format Article
series Geoscientific Model Development
spelling doaj.art-d3560db3766c46158cb521aca9eee9112023-02-20T06:31:08ZengCopernicus PublicationsGeoscientific Model Development1991-959X1991-96032023-02-01161179119010.5194/gmd-16-1179-2023The impact of altering emission data precision on compression efficiency and accuracy of simulations of the community multiscale air quality modelM. S. Walters0M. S. Walters1D. C. Wong2Atmospheric and Environmental Systems Modeling Division, Center for Environmental Measurement and Modeling, Office of Research and Development, US Environmental Protection Agency, Research Triangle Park, NC, USAOak Ridge Associated Universities, Oak Ridge, TN, USAAtmospheric and Environmental Systems Modeling Division, Center for Environmental Measurement and Modeling, Office of Research and Development, US Environmental Protection Agency, Research Triangle Park, NC, USA<p>The Community Multiscale Air Quality (CMAQ) model has been a vital tool for air quality research and management at the United States Environmental Protection Agency (US EPA) and at government environmental agencies and academic institutions worldwide. The CMAQ model requires a significant amount of disk space to store and archive input and output files. For example, an annual simulation over the contiguous United States (CONUS) with horizontal grid-cell spacing of 12 km requires 2–3 TB of input data and can produce anywhere from 7–45 TB of output data, depending on modeling configuration and desired post-processing of the output (e.g., for evaluations or graphics). After a simulation is complete, model data are archived for several years, or even decades, to ensure the replicability of conducted research. As a result, careful disk space management is essential to optimize resources and ensure the uninterrupted progress of ongoing research and applications requiring large-scale, air quality modeling. Proper disk-space management may include applying optimal data-compression techniques that are executed on input and output files for all CMAQ simulations. There are several (not limited to) such utilities that compress files using lossless compression, such as GNU Gzip (gzip) and Basic Leucine Zipper Domain (bzip2). A new approach is proposed in this study that reduces the precision of the emission input for air quality modeling to reduce storage requirements (after a lossless compression utility is applied) and accelerate runtime. The new approach is tested using CMAQ simulations and post-processed CMAQ output to examine the impact on the performance of the air quality model. In total, four simulations were conducted, and nine cases were post-processed from direct simulation output to determine disk-space efficiency, runtime efficiency, and model (predictive) accuracy. Three simulations were run with emission input containing only five, four, or three significant digits. To enhance the analysis of disk-space efficiency, the output from the altered precision emission CMAQ simulations were additionally post-processed to contain five, four, or three significant digits. The fourth, and final, simulation was run using the full precision emission files with no alteration. Thus, in total, 13 gridded products (4 simulations and 9 altered precision output cases) were analyzed in this study.</p> <p>Results demonstrate that the altered precision emission files reduced the disk-space footprint by 6 %, 25 %, and 48 % compared to the unaltered emission files when using the bzip2 compression utility for files containing five, four, or three significant digits, respectively. Similarly, the altered output files reduced the required disk space by 19 %, 47 %, and 69 % compared to the unaltered CMAQ output files when using the bzip2 compression utility for files containing five, four, or three significant digits, respectively. For both compressed datasets, bzip2 performed better than gzip, in terms of compression size, by 5 %–27 % for emission data and 15 %–28 % for CMAQ output for files containing five, four, or three significant digits. Additionally, CMAQ runtime was reduced by 2 %–7 % for simulations using emission files with<span id="page1180"/> reduced precision data in a non-dedicated environment. Finally, the model-estimated pollutant concentrations from the four simulations were compared to observed data from the US EPA Air Quality System (AQS) and the Ammonia Monitoring Network (AMoN). Model performance statistics were impacted negligibly. In summary, by reducing the precision of CMAQ emission data to five, four, or three significant digits, the simulation runtime in a non-dedicated environment was slightly reduced, disk-space usage was substantially reduced, and model accuracy remained relatively unchanged compared to the base CMAQ simulation, which suggests that the precision of the emission data could be reduced to more efficiently use computing resources while minimizing the impact on CMAQ simulations.</p>https://gmd.copernicus.org/articles/16/1179/2023/gmd-16-1179-2023.pdf
spellingShingle M. S. Walters
M. S. Walters
D. C. Wong
The impact of altering emission data precision on compression efficiency and accuracy of simulations of the community multiscale air quality model
Geoscientific Model Development
title The impact of altering emission data precision on compression efficiency and accuracy of simulations of the community multiscale air quality model
title_full The impact of altering emission data precision on compression efficiency and accuracy of simulations of the community multiscale air quality model
title_fullStr The impact of altering emission data precision on compression efficiency and accuracy of simulations of the community multiscale air quality model
title_full_unstemmed The impact of altering emission data precision on compression efficiency and accuracy of simulations of the community multiscale air quality model
title_short The impact of altering emission data precision on compression efficiency and accuracy of simulations of the community multiscale air quality model
title_sort impact of altering emission data precision on compression efficiency and accuracy of simulations of the community multiscale air quality model
url https://gmd.copernicus.org/articles/16/1179/2023/gmd-16-1179-2023.pdf
work_keys_str_mv AT mswalters theimpactofalteringemissiondataprecisiononcompressionefficiencyandaccuracyofsimulationsofthecommunitymultiscaleairqualitymodel
AT mswalters theimpactofalteringemissiondataprecisiononcompressionefficiencyandaccuracyofsimulationsofthecommunitymultiscaleairqualitymodel
AT dcwong theimpactofalteringemissiondataprecisiononcompressionefficiencyandaccuracyofsimulationsofthecommunitymultiscaleairqualitymodel
AT mswalters impactofalteringemissiondataprecisiononcompressionefficiencyandaccuracyofsimulationsofthecommunitymultiscaleairqualitymodel
AT mswalters impactofalteringemissiondataprecisiononcompressionefficiencyandaccuracyofsimulationsofthecommunitymultiscaleairqualitymodel
AT dcwong impactofalteringemissiondataprecisiononcompressionefficiencyandaccuracyofsimulationsofthecommunitymultiscaleairqualitymodel