Evaluating lossy data compression on climate simulation data within a large ensemble
High-resolution Earth system model simulations generate enormous data volumes, and retaining the data from these simulations often strains institutional storage resources. Further, these exceedingly large storage requirements negatively impact science objectives, for example, by forcing reductions i...
Main Authors: | , , , , , , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Copernicus Publications
2016-12-01
|
Series: | Geoscientific Model Development |
Online Access: | http://www.geosci-model-dev.net/9/4381/2016/gmd-9-4381-2016.pdf |
Summary: | High-resolution Earth system model simulations generate enormous
data volumes, and retaining the data from these simulations often
strains institutional storage resources. Further, these
exceedingly large storage requirements negatively impact science
objectives, for example, by forcing reductions in data output frequency,
simulation length, or ensemble size. To lessen data
volumes from the Community Earth System Model (CESM), we advocate
the use of lossy data compression techniques. While lossy data
compression does not exactly preserve the original data (as
lossless compression does), lossy techniques have an advantage in
terms of smaller storage requirements. To preserve the integrity
of the scientific simulation data, the effects of lossy data
compression on the original data should, at a minimum, not be
statistically distinguishable from the natural variability of the
climate system, and previous preliminary work with data from CESM
has shown this goal to be attainable. However, to ultimately
convince climate scientists that it is acceptable to use lossy data
compression, we provide climate scientists with access to publicly
available climate data that have undergone lossy data compression. In particular, we report on the results of a lossy data compression
experiment with output from the CESM Large Ensemble (CESM-LE)
Community Project, in which we challenge climate scientists to
examine features of the data relevant to their interests, and
attempt to identify which of the ensemble members have been
compressed and reconstructed. We find that while detecting
distinguishing features is certainly possible, the compression
effects noticeable in these features are often unimportant or
disappear in post-processing analyses. In addition, we perform
several analyses that directly compare the original data to the
reconstructed data to investigate the preservation, or lack
thereof, of specific features critical to climate science. Overall, we conclude that applying lossy data compression to
climate simulation data is both advantageous in terms of data
reduction and generally acceptable in terms of effects on
scientific results. |
---|---|
ISSN: | 1991-959X 1991-9603 |