Machine learning models inaccurately predict current and future high-latitude C balances

The high-latitude carbon (C) cycle is a key feedback to the global climate system, yet because of system complexity and data limitations, there is currently disagreement over whether the region is a source or sink of C. Recent advances in big data analytics and computing power have popularized the u...

Full description

Bibliographic Details
Main Authors:	Ian A Shirley, Zelalem A Mekonnen, Robert F Grant, Baptiste Dafflon, William J Riley
Format:	Article
Language:	English
Published:	IOP Publishing 2023-01-01
Series:	Environmental Research Letters
Subjects:	machine learning upscaling forecasting independent evaluation carbon cycle high-latitudes
Online Access:	https://doi.org/10.1088/1748-9326/acacb2

_version_	1797747253331361792
author	Ian A Shirley Zelalem A Mekonnen Robert F Grant Baptiste Dafflon William J Riley
author_facet	Ian A Shirley Zelalem A Mekonnen Robert F Grant Baptiste Dafflon William J Riley
author_sort	Ian A Shirley
collection	DOAJ
description	The high-latitude carbon (C) cycle is a key feedback to the global climate system, yet because of system complexity and data limitations, there is currently disagreement over whether the region is a source or sink of C. Recent advances in big data analytics and computing power have popularized the use of machine learning (ML) algorithms to upscale site measurements of ecosystem processes, and in some cases forecast the response of these processes to climate change. Due to data limitations, however, ML model predictions of these processes are almost never validated with independent datasets. To better understand and characterize the limitations of these methods, we develop an approach to independently evaluate ML upscaling and forecasting. We mimic data-driven upscaling and forecasting efforts by applying ML algorithms to different subsets of regional process-model simulation gridcells, and then test ML performance using the remaining gridcells. In this study, we simulate C fluxes and environmental data across Alaska using ecosys , a process-rich terrestrial ecosystem model, and then apply boosted regression tree ML algorithms to training data configurations that mirror and expand upon existing AmeriFLUX eddy-covariance data availability. We first show that a ML model trained using ecosys outputs from currently-available Alaska AmeriFLUX sites incorrectly predicts that Alaska is presently a modeled net C source. Increased spatial coverage of the training dataset improves ML predictions, halving the bias when 240 modeled sites are used instead of 15. However, even this more accurate ML model incorrectly predicts Alaska C fluxes under 21st century climate change because of changes in atmospheric CO _2 , litter inputs, and vegetation composition that have impacts on C fluxes which cannot be inferred from the training data. Our results provide key insights to future C flux upscaling efforts and expose the potential for inaccurate ML upscaling and forecasting of high-latitude C cycle dynamics.
first_indexed	2024-03-12T15:48:11Z
format	Article
id	doaj.art-aadf6efca9614e839824ed7acaea4520
institution	Directory Open Access Journal
issn	1748-9326
language	English
last_indexed	2024-03-12T15:48:11Z
publishDate	2023-01-01
publisher	IOP Publishing
record_format	Article
series	Environmental Research Letters
spelling	doaj.art-aadf6efca9614e839824ed7acaea45202023-08-09T15:20:18ZengIOP PublishingEnvironmental Research Letters1748-93262023-01-0118101402610.1088/1748-9326/acacb2Machine learning models inaccurately predict current and future high-latitude C balancesIan A Shirley0https://orcid.org/0000-0002-2229-1414Zelalem A Mekonnen1https://orcid.org/0000-0002-2647-0671Robert F Grant2https://orcid.org/0000-0002-8890-6231Baptiste Dafflon3https://orcid.org/0000-0001-9871-5650William J Riley4https://orcid.org/0000-0002-4615-2304Climate and Ecosystem Sciences Division, Lawrence Berkeley National Laboratory , Berkeley, CA, United States of America; Department of Physics, University of California-Berkeley , Berkeley 94720-3114, CA, United States of AmericaClimate and Ecosystem Sciences Division, Lawrence Berkeley National Laboratory , Berkeley, CA, United States of AmericaDepartment of Renewable Resources, University of Alberta , Edmonton, CanadaClimate and Ecosystem Sciences Division, Lawrence Berkeley National Laboratory , Berkeley, CA, United States of AmericaClimate and Ecosystem Sciences Division, Lawrence Berkeley National Laboratory , Berkeley, CA, United States of AmericaThe high-latitude carbon (C) cycle is a key feedback to the global climate system, yet because of system complexity and data limitations, there is currently disagreement over whether the region is a source or sink of C. Recent advances in big data analytics and computing power have popularized the use of machine learning (ML) algorithms to upscale site measurements of ecosystem processes, and in some cases forecast the response of these processes to climate change. Due to data limitations, however, ML model predictions of these processes are almost never validated with independent datasets. To better understand and characterize the limitations of these methods, we develop an approach to independently evaluate ML upscaling and forecasting. We mimic data-driven upscaling and forecasting efforts by applying ML algorithms to different subsets of regional process-model simulation gridcells, and then test ML performance using the remaining gridcells. In this study, we simulate C fluxes and environmental data across Alaska using ecosys , a process-rich terrestrial ecosystem model, and then apply boosted regression tree ML algorithms to training data configurations that mirror and expand upon existing AmeriFLUX eddy-covariance data availability. We first show that a ML model trained using ecosys outputs from currently-available Alaska AmeriFLUX sites incorrectly predicts that Alaska is presently a modeled net C source. Increased spatial coverage of the training dataset improves ML predictions, halving the bias when 240 modeled sites are used instead of 15. However, even this more accurate ML model incorrectly predicts Alaska C fluxes under 21st century climate change because of changes in atmospheric CO _2 , litter inputs, and vegetation composition that have impacts on C fluxes which cannot be inferred from the training data. Our results provide key insights to future C flux upscaling efforts and expose the potential for inaccurate ML upscaling and forecasting of high-latitude C cycle dynamics.https://doi.org/10.1088/1748-9326/acacb2machine learningupscalingforecastingindependent evaluationcarbon cyclehigh-latitudes
spellingShingle	Ian A Shirley Zelalem A Mekonnen Robert F Grant Baptiste Dafflon William J Riley Machine learning models inaccurately predict current and future high-latitude C balances Environmental Research Letters machine learning upscaling forecasting independent evaluation carbon cycle high-latitudes
title	Machine learning models inaccurately predict current and future high-latitude C balances
title_full	Machine learning models inaccurately predict current and future high-latitude C balances
title_fullStr	Machine learning models inaccurately predict current and future high-latitude C balances
title_full_unstemmed	Machine learning models inaccurately predict current and future high-latitude C balances
title_short	Machine learning models inaccurately predict current and future high-latitude C balances
title_sort	machine learning models inaccurately predict current and future high latitude c balances
topic	machine learning upscaling forecasting independent evaluation carbon cycle high-latitudes
url	https://doi.org/10.1088/1748-9326/acacb2
work_keys_str_mv	AT ianashirley machinelearningmodelsinaccuratelypredictcurrentandfuturehighlatitudecbalances AT zelalemamekonnen machinelearningmodelsinaccuratelypredictcurrentandfuturehighlatitudecbalances AT robertfgrant machinelearningmodelsinaccuratelypredictcurrentandfuturehighlatitudecbalances AT baptistedafflon machinelearningmodelsinaccuratelypredictcurrentandfuturehighlatitudecbalances AT williamjriley machinelearningmodelsinaccuratelypredictcurrentandfuturehighlatitudecbalances

Machine learning models inaccurately predict current and future high-latitude C balances

Similar Items