The compression–error trade-off for large gridded data sets
The netCDF-4 format is widely used for large gridded scientific data sets and includes several compression methods: lossy linear scaling and the non-lossy deflate and shuffle algorithms. Many multidimensional geoscientific data sets exhibit considerable variation over one or several spatial dimensio...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Copernicus Publications
2017-01-01
|
Series: | Geoscientific Model Development |
Online Access: | http://www.geosci-model-dev.net/10/413/2017/gmd-10-413-2017.pdf |
Summary: | The netCDF-4 format is widely used for large gridded scientific data sets and
includes several compression methods: lossy linear scaling and the non-lossy
deflate and shuffle algorithms. Many multidimensional geoscientific data sets
exhibit considerable variation over one or several spatial dimensions (e.g.,
vertically) with less variation in the remaining dimensions (e.g.,
horizontally). On such data sets, linear scaling with a single pair of scale
and offset parameters often entails considerable loss of precision. We
introduce an alternative compression method called "layer-packing" that
simultaneously exploits lossy linear scaling and lossless compression.
Layer-packing stores arrays (instead of a scalar pair) of scale and offset
parameters. An implementation of this method is compared with lossless
compression, storing data at fixed relative precision (bit-grooming) and
scalar linear packing in terms of compression ratio, accuracy and speed.
<br><br>
When viewed as a trade-off between compression and error, layer-packing
yields similar results to bit-grooming (storing between 3 and 4 significant
figures). Bit-grooming and layer-packing offer significantly better control
of precision than scalar linear packing. Relative performance, in terms of
compression and errors, of bit-groomed and layer-packed data were strongly
predicted by the entropy of the exponent array, and lossless compression was
well predicted by entropy of the original data array. Layer-packed data files
must be "unpacked" to be readily usable. The compression and precision
characteristics make layer-packing a competitive archive format for many
scientific data sets. |
---|---|
ISSN: | 1991-959X 1991-9603 |