Exploring Transferable Techniques to Retrieve Crop Biophysical and Biochemical Variables Using Sentinel-2 Data

The current study aimed to determine the spatial transferability of eXtreme Gradient Boosting (XGBoost) models for estimating biophysical and biochemical variables (BVs), using Sentinel-2 data. The specific objectives were to: (1) assess the effect of different proportions of training samples (i.e.,...

Full description

Bibliographic Details
Main Authors: Mahlatse Kganyago, Clement Adjorlolo, Paidamwoyo Mhangara
Format: Article
Language:English
Published: MDPI AG 2022-08-01
Series:Remote Sensing
Subjects:
Online Access:https://www.mdpi.com/2072-4292/14/16/3968
_version_ 1797408089130926080
author Mahlatse Kganyago
Clement Adjorlolo
Paidamwoyo Mhangara
author_facet Mahlatse Kganyago
Clement Adjorlolo
Paidamwoyo Mhangara
author_sort Mahlatse Kganyago
collection DOAJ
description The current study aimed to determine the spatial transferability of eXtreme Gradient Boosting (XGBoost) models for estimating biophysical and biochemical variables (BVs), using Sentinel-2 data. The specific objectives were to: (1) assess the effect of different proportions of training samples (i.e., 25%, 50%, and 75%) available at the Target site (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi mathvariant="script">D</mi><mi>T</mi></msub></mrow></semantics></math></inline-formula>) on the spatial transferability of the XGBoost models and (2) evaluate the effect of the Source site (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi mathvariant="script">D</mi><mi>S</mi></msub></mrow></semantics></math></inline-formula>) (i.e., trained) model accuracy on the Target site (i.e., unseen) retrieval uncertainty. The results showed that the Bothaville (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi mathvariant="script">D</mi><mi>S</mi></msub></mrow></semantics></math></inline-formula>) → Harrismith (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi mathvariant="script">D</mi><mi>T</mi></msub></mrow></semantics></math></inline-formula>) Leaf Area Index (LAI) models required only fewer proportions, i.e., 25% or 50%, of the training samples to make optimal retrievals in the <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi mathvariant="script">D</mi><mi>T</mi></msub></mrow></semantics></math></inline-formula> (i.e., RMSE: 0.61 m<sup>2</sup> m<sup>−2</sup>; <i>R</i><sup>2</sup>: 59%), while Harrismith (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi mathvariant="script">D</mi><mi>S</mi></msub></mrow></semantics></math></inline-formula>) →Bothaville (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi mathvariant="script">D</mi><mi>T</mi></msub></mrow></semantics></math></inline-formula>) LAI models required up to 75% of training samples in the <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi mathvariant="script">D</mi><mi>T</mi></msub></mrow></semantics></math></inline-formula> to obtain optimal LAI retrievals (i.e., RMSE = 0.63 m<sup>2</sup> m<sup>−2</sup>; <i>R</i><sup>2</sup> = 67%). In contrast, the chlorophyll content models for Bothaville (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi mathvariant="script">D</mi><mi>S</mi></msub></mrow></semantics></math></inline-formula>) → Harrismith (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi mathvariant="script">D</mi><mi>T</mi></msub></mrow></semantics></math></inline-formula>) required significant proportions of samples (i.e., 75%) from the <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi mathvariant="script">D</mi><mi>T</mi></msub></mrow></semantics></math></inline-formula> to make optimal retrievals of Leaf Chlorophyll Content (LC<b><i><sub>ab</sub></i></b>) (i.e., RMSE: 7.09 µg cm<sup>−2</sup>; <i>R</i><sup>2</sup>: 58%) and Canopy Chlorophyll Content (CCC) (i.e., RMSE: 36.3 µg cm<sup>−2</sup>; <i>R</i><sup>2</sup>: 61%), while Harrismith (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi mathvariant="script">D</mi><mi>S</mi></msub></mrow></semantics></math></inline-formula>) →Bothaville (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi mathvariant="script">D</mi><mi>T</mi></msub></mrow></semantics></math></inline-formula>) models required only 25% of the samples to achieve RMSEs of 8.16 µg cm<sup>−2</sup> (<i>R</i><sup>2</sup>: 83%) and 40.25 µg cm<sup>−2</sup> (<i>R</i><sup>2</sup>: 77%), for LC<b><i><sub>ab</sub></i></b> and CCC, respectively. The results also showed that the source site model accuracy led to better transferability for LAI retrievals. In contrast, the accuracy of LC<b><i><sub>ab</sub></i></b> and CCC source site models did not necessarily improve their transferability. Overall, the results elucidate the potential of transferable Machine Learning Regression Algorithms and are significant for the rapid retrieval of important crop BVs in data-scarce areas, thus facilitating spatially-explicit information for site-specific farm management.
first_indexed 2024-03-09T03:53:18Z
format Article
id doaj.art-3780061543514a70ad9088221e042d26
institution Directory Open Access Journal
issn 2072-4292
language English
last_indexed 2024-03-09T03:53:18Z
publishDate 2022-08-01
publisher MDPI AG
record_format Article
series Remote Sensing
spelling doaj.art-3780061543514a70ad9088221e042d262023-12-03T14:24:25ZengMDPI AGRemote Sensing2072-42922022-08-011416396810.3390/rs14163968Exploring Transferable Techniques to Retrieve Crop Biophysical and Biochemical Variables Using Sentinel-2 DataMahlatse Kganyago0Clement Adjorlolo1Paidamwoyo Mhangara2School of Geography, Archaeology and Environmental Studies, University of the Witwatersrand, Johannesburg 2050, South AfricaSchool of Geography, Archaeology and Environmental Studies, University of the Witwatersrand, Johannesburg 2050, South AfricaSchool of Geography, Archaeology and Environmental Studies, University of the Witwatersrand, Johannesburg 2050, South AfricaThe current study aimed to determine the spatial transferability of eXtreme Gradient Boosting (XGBoost) models for estimating biophysical and biochemical variables (BVs), using Sentinel-2 data. The specific objectives were to: (1) assess the effect of different proportions of training samples (i.e., 25%, 50%, and 75%) available at the Target site (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi mathvariant="script">D</mi><mi>T</mi></msub></mrow></semantics></math></inline-formula>) on the spatial transferability of the XGBoost models and (2) evaluate the effect of the Source site (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi mathvariant="script">D</mi><mi>S</mi></msub></mrow></semantics></math></inline-formula>) (i.e., trained) model accuracy on the Target site (i.e., unseen) retrieval uncertainty. The results showed that the Bothaville (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi mathvariant="script">D</mi><mi>S</mi></msub></mrow></semantics></math></inline-formula>) → Harrismith (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi mathvariant="script">D</mi><mi>T</mi></msub></mrow></semantics></math></inline-formula>) Leaf Area Index (LAI) models required only fewer proportions, i.e., 25% or 50%, of the training samples to make optimal retrievals in the <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi mathvariant="script">D</mi><mi>T</mi></msub></mrow></semantics></math></inline-formula> (i.e., RMSE: 0.61 m<sup>2</sup> m<sup>−2</sup>; <i>R</i><sup>2</sup>: 59%), while Harrismith (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi mathvariant="script">D</mi><mi>S</mi></msub></mrow></semantics></math></inline-formula>) →Bothaville (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi mathvariant="script">D</mi><mi>T</mi></msub></mrow></semantics></math></inline-formula>) LAI models required up to 75% of training samples in the <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi mathvariant="script">D</mi><mi>T</mi></msub></mrow></semantics></math></inline-formula> to obtain optimal LAI retrievals (i.e., RMSE = 0.63 m<sup>2</sup> m<sup>−2</sup>; <i>R</i><sup>2</sup> = 67%). In contrast, the chlorophyll content models for Bothaville (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi mathvariant="script">D</mi><mi>S</mi></msub></mrow></semantics></math></inline-formula>) → Harrismith (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi mathvariant="script">D</mi><mi>T</mi></msub></mrow></semantics></math></inline-formula>) required significant proportions of samples (i.e., 75%) from the <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi mathvariant="script">D</mi><mi>T</mi></msub></mrow></semantics></math></inline-formula> to make optimal retrievals of Leaf Chlorophyll Content (LC<b><i><sub>ab</sub></i></b>) (i.e., RMSE: 7.09 µg cm<sup>−2</sup>; <i>R</i><sup>2</sup>: 58%) and Canopy Chlorophyll Content (CCC) (i.e., RMSE: 36.3 µg cm<sup>−2</sup>; <i>R</i><sup>2</sup>: 61%), while Harrismith (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi mathvariant="script">D</mi><mi>S</mi></msub></mrow></semantics></math></inline-formula>) →Bothaville (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi mathvariant="script">D</mi><mi>T</mi></msub></mrow></semantics></math></inline-formula>) models required only 25% of the samples to achieve RMSEs of 8.16 µg cm<sup>−2</sup> (<i>R</i><sup>2</sup>: 83%) and 40.25 µg cm<sup>−2</sup> (<i>R</i><sup>2</sup>: 77%), for LC<b><i><sub>ab</sub></i></b> and CCC, respectively. The results also showed that the source site model accuracy led to better transferability for LAI retrievals. In contrast, the accuracy of LC<b><i><sub>ab</sub></i></b> and CCC source site models did not necessarily improve their transferability. Overall, the results elucidate the potential of transferable Machine Learning Regression Algorithms and are significant for the rapid retrieval of important crop BVs in data-scarce areas, thus facilitating spatially-explicit information for site-specific farm management.https://www.mdpi.com/2072-4292/14/16/3968spatial transferabilitymachine learningleaf area indexprecision agriculturechlorophyll contentSentinel-2
spellingShingle Mahlatse Kganyago
Clement Adjorlolo
Paidamwoyo Mhangara
Exploring Transferable Techniques to Retrieve Crop Biophysical and Biochemical Variables Using Sentinel-2 Data
Remote Sensing
spatial transferability
machine learning
leaf area index
precision agriculture
chlorophyll content
Sentinel-2
title Exploring Transferable Techniques to Retrieve Crop Biophysical and Biochemical Variables Using Sentinel-2 Data
title_full Exploring Transferable Techniques to Retrieve Crop Biophysical and Biochemical Variables Using Sentinel-2 Data
title_fullStr Exploring Transferable Techniques to Retrieve Crop Biophysical and Biochemical Variables Using Sentinel-2 Data
title_full_unstemmed Exploring Transferable Techniques to Retrieve Crop Biophysical and Biochemical Variables Using Sentinel-2 Data
title_short Exploring Transferable Techniques to Retrieve Crop Biophysical and Biochemical Variables Using Sentinel-2 Data
title_sort exploring transferable techniques to retrieve crop biophysical and biochemical variables using sentinel 2 data
topic spatial transferability
machine learning
leaf area index
precision agriculture
chlorophyll content
Sentinel-2
url https://www.mdpi.com/2072-4292/14/16/3968
work_keys_str_mv AT mahlatsekganyago exploringtransferabletechniquestoretrievecropbiophysicalandbiochemicalvariablesusingsentinel2data
AT clementadjorlolo exploringtransferabletechniquestoretrievecropbiophysicalandbiochemicalvariablesusingsentinel2data
AT paidamwoyomhangara exploringtransferabletechniquestoretrievecropbiophysicalandbiochemicalvariablesusingsentinel2data