eddy4R 0.2.0: a DevOps model for community-extensible processing and analysis of eddy-covariance data based on R, Git, Docker, and HDF5
Large differences in instrumentation, site setup, data format, and operating system stymie the adoption of a universal computational environment for processing and analyzing eddy-covariance (EC) data. This results in limited software applicability and extensibility in addition to often substantia...
Main Authors: | , , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Copernicus Publications
2017-08-01
|
Series: | Geoscientific Model Development |
Online Access: | https://www.geosci-model-dev.net/10/3189/2017/gmd-10-3189-2017.pdf |
Summary: | Large differences in instrumentation, site setup, data format, and operating
system stymie the adoption of a universal computational environment for
processing and analyzing eddy-covariance (EC) data. This results in limited
software applicability and extensibility in addition to often substantial
inconsistencies in flux estimates. Addressing these concerns, this paper
presents the systematic development of portable, reproducible, and
extensible EC software achieved by adopting a development and systems
operation (DevOps) approach. This software development model is used for the
creation of the eddy4R family of EC code packages in the open-source
R language for statistical computing. These packages are community developed,
iterated via the Git distributed version control system, and wrapped into a
portable and reproducible Docker filesystem that is independent of the
underlying host operating system. The HDF5 hierarchical data format then
provides a streamlined mechanism for highly compressed and fully
self-documented data ingest and output.
<br><br>
The usefulness of the DevOps approach was evaluated for three test
applications. First, the resultant EC processing software was used to
analyze standard flux tower data from the first EC instruments installed at
a National Ecological Observatory (NEON) field site. Second, through an
aircraft test application, we demonstrate the modular extensibility of eddy4R
to analyze EC data from other platforms. Third, an intercomparison with
commercial-grade software showed excellent agreement (<i>R</i><sup>2</sup> = 1.0 for CO<sub>2</sub> flux). In conjunction with this study, a Docker image containing the first two eddy4R packages and an executable example workflow, as well as first NEON EC data products are released publicly. We conclude by describing the work remaining to arrive at the automated generation of science-grade EC fluxes and benefits to the science community at large.
<br><br>
This software development model is applicable beyond EC and more generally
builds the capacity to deploy complex algorithms developed by scientists in
an efficient and scalable manner. In addition, modularity permits meeting
project milestones while retaining extensibility with time. |
---|---|
ISSN: | 1991-959X 1991-9603 |