Exploring Transferable Techniques to Retrieve Crop Biophysical and Biochemical Variables Using Sentinel-2 Data

The current study aimed to determine the spatial transferability of eXtreme Gradient Boosting (XGBoost) models for estimating biophysical and biochemical variables (BVs), using Sentinel-2 data. The specific objectives were to: (1) assess the effect of different proportions of training samples (i.e.,...

Full description

Bibliographic Details
Main Authors:	Mahlatse Kganyago, Clement Adjorlolo, Paidamwoyo Mhangara
Format:	Article
Language:	English
Published:	MDPI AG 2022-08-01
Series:	Remote Sensing
Subjects:	spatial transferability machine learning leaf area index precision agriculture chlorophyll content Sentinel-2
Online Access:	https://www.mdpi.com/2072-4292/14/16/3968

_version_	1797408089130926080
author	Mahlatse Kganyago Clement Adjorlolo Paidamwoyo Mhangara
author_facet	Mahlatse Kganyago Clement Adjorlolo Paidamwoyo Mhangara
author_sort	Mahlatse Kganyago
collection	DOAJ
description	The current study aimed to determine the spatial transferability of eXtreme Gradient Boosting (XGBoost) models for estimating biophysical and biochemical variables (BVs), using Sentinel-2 data. The specific objectives were to: (1) assess the effect of different proportions of training samples (i.e., 25%, 50%, and 75%) available at the Target site (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi mathvariant="script">D</mi><mi>T</mi></msub></mrow></semantics></math></inline-formula>) on the spatial transferability of the XGBoost models and (2) evaluate the effect of the Source site (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi mathvariant="script">D</mi><mi>S</mi></msub></mrow></semantics></math></inline-formula>) (i.e., trained) model accuracy on the Target site (i.e., unseen) retrieval uncertainty. The results showed that the Bothaville (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi mathvariant="script">D</mi><mi>S</mi></msub></mrow></semantics></math></inline-formula>) → Harrismith (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi mathvariant="script">D</mi><mi>T</mi></msub></mrow></semantics></math></inline-formula>) Leaf Area Index (LAI) models required only fewer proportions, i.e., 25% or 50%, of the training samples to make optimal retrievals in the <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi mathvariant="script">D</mi><mi>T</mi></msub></mrow></semantics></math></inline-formula> (i.e., RMSE: 0.61 m<sup>2</sup> m<sup>−2</sup>; <i>R</i><sup>2</sup>: 59%), while Harrismith (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi mathvariant="script">D</mi><mi>S</mi></msub></mrow></semantics></math></inline-formula>) →Bothaville (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi mathvariant="script">D</mi><mi>T</mi></msub></mrow></semantics></math></inline-formula>) LAI models required up to 75% of training samples in the <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi mathvariant="script">D</mi><mi>T</mi></msub></mrow></semantics></math></inline-formula> to obtain optimal LAI retrievals (i.e., RMSE = 0.63 m<sup>2</sup> m<sup>−2</sup>; <i>R</i><sup>2</sup> = 67%). In contrast, the chlorophyll content models for Bothaville (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi mathvariant="script">D</mi><mi>S</mi></msub></mrow></semantics></math></inline-formula>) → Harrismith (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi mathvariant="script">D</mi><mi>T</mi></msub></mrow></semantics></math></inline-formula>) required significant proportions of samples (i.e., 75%) from the <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi mathvariant="script">D</mi><mi>T</mi></msub></mrow></semantics></math></inline-formula> to make optimal retrievals of Leaf Chlorophyll Content (LC<b><i><sub>ab</sub></i></b>) (i.e., RMSE: 7.09 µg cm<sup>−2</sup>; <i>R</i><sup>2</sup>: 58%) and Canopy Chlorophyll Content (CCC) (i.e., RMSE: 36.3 µg cm<sup>−2</sup>; <i>R</i><sup>2</sup>: 61%), while Harrismith (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi mathvariant="script">D</mi><mi>S</mi></msub></mrow></semantics></math></inline-formula>) →Bothaville (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi mathvariant="script">D</mi><mi>T</mi></msub></mrow></semantics></math></inline-formula>) models required only 25% of the samples to achieve RMSEs of 8.16 µg cm<sup>−2</sup> (<i>R</i><sup>2</sup>: 83%) and 40.25 µg cm<sup>−2</sup> (<i>R</i><sup>2</sup>: 77%), for LC<b><i><sub>ab</sub></i></b> and CCC, respectively. The results also showed that the source site model accuracy led to better transferability for LAI retrievals. In contrast, the accuracy of LC<b><i><sub>ab</sub></i></b> and CCC source site models did not necessarily improve their transferability. Overall, the results elucidate the potential of transferable Machine Learning Regression Algorithms and are significant for the rapid retrieval of important crop BVs in data-scarce areas, thus facilitating spatially-explicit information for site-specific farm management.
first_indexed	2024-03-09T03:53:18Z
format	Article
id	doaj.art-3780061543514a70ad9088221e042d26
institution	Directory Open Access Journal
issn	2072-4292
language	English
last_indexed	2024-03-09T03:53:18Z
publishDate	2022-08-01
publisher	MDPI AG
record_format	Article
series	Remote Sensing
spelling	doaj.art-3780061543514a70ad9088221e042d262023-12-03T14:24:25ZengMDPI AGRemote Sensing2072-42922022-08-011416396810.3390/rs14163968Exploring Transferable Techniques to Retrieve Crop Biophysical and Biochemical Variables Using Sentinel-2 DataMahlatse Kganyago0Clement Adjorlolo1Paidamwoyo Mhangara2School of Geography, Archaeology and Environmental Studies, University of the Witwatersrand, Johannesburg 2050, South AfricaSchool of Geography, Archaeology and Environmental Studies, University of the Witwatersrand, Johannesburg 2050, South AfricaSchool of Geography, Archaeology and Environmental Studies, University of the Witwatersrand, Johannesburg 2050, South AfricaThe current study aimed to determine the spatial transferability of eXtreme Gradient Boosting (XGBoost) models for estimating biophysical and biochemical variables (BVs), using Sentinel-2 data. The specific objectives were to: (1) assess the effect of different proportions of training samples (i.e., 25%, 50%, and 75%) available at the Target site (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi mathvariant="script">D</mi><mi>T</mi></msub></mrow></semantics></math></inline-formula>) on the spatial transferability of the XGBoost models and (2) evaluate the effect of the Source site (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi mathvariant="script">D</mi><mi>S</mi></msub></mrow></semantics></math></inline-formula>) (i.e., trained) model accuracy on the Target site (i.e., unseen) retrieval uncertainty. The results showed that the Bothaville (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi mathvariant="script">D</mi><mi>S</mi></msub></mrow></semantics></math></inline-formula>) → Harrismith (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi mathvariant="script">D</mi><mi>T</mi></msub></mrow></semantics></math></inline-formula>) Leaf Area Index (LAI) models required only fewer proportions, i.e., 25% or 50%, of the training samples to make optimal retrievals in the <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi mathvariant="script">D</mi><mi>T</mi></msub></mrow></semantics></math></inline-formula> (i.e., RMSE: 0.61 m<sup>2</sup> m<sup>−2</sup>; <i>R</i><sup>2</sup>: 59%), while Harrismith (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi mathvariant="script">D</mi><mi>S</mi></msub></mrow></semantics></math></inline-formula>) →Bothaville (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi mathvariant="script">D</mi><mi>T</mi></msub></mrow></semantics></math></inline-formula>) LAI models required up to 75% of training samples in the <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi mathvariant="script">D</mi><mi>T</mi></msub></mrow></semantics></math></inline-formula> to obtain optimal LAI retrievals (i.e., RMSE = 0.63 m<sup>2</sup> m<sup>−2</sup>; <i>R</i><sup>2</sup> = 67%). In contrast, the chlorophyll content models for Bothaville (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi mathvariant="script">D</mi><mi>S</mi></msub></mrow></semantics></math></inline-formula>) → Harrismith (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi mathvariant="script">D</mi><mi>T</mi></msub></mrow></semantics></math></inline-formula>) required significant proportions of samples (i.e., 75%) from the <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi mathvariant="script">D</mi><mi>T</mi></msub></mrow></semantics></math></inline-formula> to make optimal retrievals of Leaf Chlorophyll Content (LC<b><i><sub>ab</sub></i></b>) (i.e., RMSE: 7.09 µg cm<sup>−2</sup>; <i>R</i><sup>2</sup>: 58%) and Canopy Chlorophyll Content (CCC) (i.e., RMSE: 36.3 µg cm<sup>−2</sup>; <i>R</i><sup>2</sup>: 61%), while Harrismith (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi mathvariant="script">D</mi><mi>S</mi></msub></mrow></semantics></math></inline-formula>) →Bothaville (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi mathvariant="script">D</mi><mi>T</mi></msub></mrow></semantics></math></inline-formula>) models required only 25% of the samples to achieve RMSEs of 8.16 µg cm<sup>−2</sup> (<i>R</i><sup>2</sup>: 83%) and 40.25 µg cm<sup>−2</sup> (<i>R</i><sup>2</sup>: 77%), for LC<b><i><sub>ab</sub></i></b> and CCC, respectively. The results also showed that the source site model accuracy led to better transferability for LAI retrievals. In contrast, the accuracy of LC<b><i><sub>ab</sub></i></b> and CCC source site models did not necessarily improve their transferability. Overall, the results elucidate the potential of transferable Machine Learning Regression Algorithms and are significant for the rapid retrieval of important crop BVs in data-scarce areas, thus facilitating spatially-explicit information for site-specific farm management.https://www.mdpi.com/2072-4292/14/16/3968spatial transferabilitymachine learningleaf area indexprecision agriculturechlorophyll contentSentinel-2
spellingShingle	Mahlatse Kganyago Clement Adjorlolo Paidamwoyo Mhangara Exploring Transferable Techniques to Retrieve Crop Biophysical and Biochemical Variables Using Sentinel-2 Data Remote Sensing spatial transferability machine learning leaf area index precision agriculture chlorophyll content Sentinel-2
title	Exploring Transferable Techniques to Retrieve Crop Biophysical and Biochemical Variables Using Sentinel-2 Data
title_full	Exploring Transferable Techniques to Retrieve Crop Biophysical and Biochemical Variables Using Sentinel-2 Data
title_fullStr	Exploring Transferable Techniques to Retrieve Crop Biophysical and Biochemical Variables Using Sentinel-2 Data
title_full_unstemmed	Exploring Transferable Techniques to Retrieve Crop Biophysical and Biochemical Variables Using Sentinel-2 Data
title_short	Exploring Transferable Techniques to Retrieve Crop Biophysical and Biochemical Variables Using Sentinel-2 Data
title_sort	exploring transferable techniques to retrieve crop biophysical and biochemical variables using sentinel 2 data
topic	spatial transferability machine learning leaf area index precision agriculture chlorophyll content Sentinel-2
url	https://www.mdpi.com/2072-4292/14/16/3968
work_keys_str_mv	AT mahlatsekganyago exploringtransferabletechniquestoretrievecropbiophysicalandbiochemicalvariablesusingsentinel2data AT clementadjorlolo exploringtransferabletechniquestoretrievecropbiophysicalandbiochemicalvariablesusingsentinel2data AT paidamwoyomhangara exploringtransferabletechniquestoretrievecropbiophysicalandbiochemicalvariablesusingsentinel2data

Exploring Transferable Techniques to Retrieve Crop Biophysical and Biochemical Variables Using Sentinel-2 Data

Similar Items