Improving Soil Thickness Estimations Based on Multiple Environmental Variables with Stacking Ensemble Methods

Spatially continuous soil thickness data at large scales are usually not readily available and are often difficult and expensive to acquire. Various machine learning algorithms have become very popular in digital soil mapping to predict and map the spatial distribution of soil properties. Identifyin...

Full description

Bibliographic Details
Main Authors: Xinchuan Li, Juhua Luo, Xiuliang Jin, Qiaoning He, Yun Niu
Format: Article
Language:English
Published: MDPI AG 2020-11-01
Series:Remote Sensing
Subjects:
Online Access:https://www.mdpi.com/2072-4292/12/21/3609
_version_ 1797548963757293568
author Xinchuan Li
Juhua Luo
Xiuliang Jin
Qiaoning He
Yun Niu
author_facet Xinchuan Li
Juhua Luo
Xiuliang Jin
Qiaoning He
Yun Niu
author_sort Xinchuan Li
collection DOAJ
description Spatially continuous soil thickness data at large scales are usually not readily available and are often difficult and expensive to acquire. Various machine learning algorithms have become very popular in digital soil mapping to predict and map the spatial distribution of soil properties. Identifying the controlling environmental variables of soil thickness and selecting suitable machine learning algorithms are vitally important in modeling. In this study, 11 quantitative and four qualitative environmental variables were selected to explore the main variables that affect soil thickness. Four commonly used machine learning algorithms (multiple linear regression (MLR), support vector regression (SVR), random forest (RF), and extreme gradient boosting (XGBoost) were evaluated as individual models to separately predict and obtain a soil thickness distribution map in Henan Province, China. In addition, the two stacking ensemble models using least absolute shrinkage and selection operator (LASSO) and generalized boosted regression model (GBM) were tested and applied to build the most reliable and accurate estimation model. The results showed that variable selection was a very important part of soil thickness modeling. Topographic wetness index (TWI), slope, elevation, land use and enhanced vegetation index (EVI) were the most influential environmental variables in soil thickness modeling. Comparative results showed that the XGBoost model outperformed the MLR, RF and SVR models. Importantly, the two stacking models achieved higher performance than the single model, especially when using GBM. In terms of accuracy, the proposed stacking method explained 64.0% of the variation for soil thickness. The results of our study provide useful alternative approaches for mapping soil thickness, with potential for use with other soil properties.
first_indexed 2024-03-10T15:08:13Z
format Article
id doaj.art-96cd909fbf994a5292d6afbff6841aa4
institution Directory Open Access Journal
issn 2072-4292
language English
last_indexed 2024-03-10T15:08:13Z
publishDate 2020-11-01
publisher MDPI AG
record_format Article
series Remote Sensing
spelling doaj.art-96cd909fbf994a5292d6afbff6841aa42023-11-20T19:37:40ZengMDPI AGRemote Sensing2072-42922020-11-011221360910.3390/rs12213609Improving Soil Thickness Estimations Based on Multiple Environmental Variables with Stacking Ensemble MethodsXinchuan Li0Juhua Luo1Xiuliang Jin2Qiaoning He3Yun Niu4Key Laboratory of Watershed Geographic Sciences, Nanjing Institute of Geography and Limnology, Chinese Academy of Sciences, Nanjing 210008, ChinaKey Laboratory of Watershed Geographic Sciences, Nanjing Institute of Geography and Limnology, Chinese Academy of Sciences, Nanjing 210008, ChinaInstitute of Crop Sciences, Chinese Academy of Agricultural Sciences/Key Laboratory of Crop Physiology and Ecology, Ministry of Agriculture, Beijing 100081, ChinaSchool of Urban and Environmental Sciences, Huaiyin Normal University, Huai’an 223300, ChinaSchool of Urban and Environmental Sciences, Huaiyin Normal University, Huai’an 223300, ChinaSpatially continuous soil thickness data at large scales are usually not readily available and are often difficult and expensive to acquire. Various machine learning algorithms have become very popular in digital soil mapping to predict and map the spatial distribution of soil properties. Identifying the controlling environmental variables of soil thickness and selecting suitable machine learning algorithms are vitally important in modeling. In this study, 11 quantitative and four qualitative environmental variables were selected to explore the main variables that affect soil thickness. Four commonly used machine learning algorithms (multiple linear regression (MLR), support vector regression (SVR), random forest (RF), and extreme gradient boosting (XGBoost) were evaluated as individual models to separately predict and obtain a soil thickness distribution map in Henan Province, China. In addition, the two stacking ensemble models using least absolute shrinkage and selection operator (LASSO) and generalized boosted regression model (GBM) were tested and applied to build the most reliable and accurate estimation model. The results showed that variable selection was a very important part of soil thickness modeling. Topographic wetness index (TWI), slope, elevation, land use and enhanced vegetation index (EVI) were the most influential environmental variables in soil thickness modeling. Comparative results showed that the XGBoost model outperformed the MLR, RF and SVR models. Importantly, the two stacking models achieved higher performance than the single model, especially when using GBM. In terms of accuracy, the proposed stacking method explained 64.0% of the variation for soil thickness. The results of our study provide useful alternative approaches for mapping soil thickness, with potential for use with other soil properties.https://www.mdpi.com/2072-4292/12/21/3609soil thicknessrandom forestextreme gradient boostingvariable selectionmachine learningstacking ensemble method
spellingShingle Xinchuan Li
Juhua Luo
Xiuliang Jin
Qiaoning He
Yun Niu
Improving Soil Thickness Estimations Based on Multiple Environmental Variables with Stacking Ensemble Methods
Remote Sensing
soil thickness
random forest
extreme gradient boosting
variable selection
machine learning
stacking ensemble method
title Improving Soil Thickness Estimations Based on Multiple Environmental Variables with Stacking Ensemble Methods
title_full Improving Soil Thickness Estimations Based on Multiple Environmental Variables with Stacking Ensemble Methods
title_fullStr Improving Soil Thickness Estimations Based on Multiple Environmental Variables with Stacking Ensemble Methods
title_full_unstemmed Improving Soil Thickness Estimations Based on Multiple Environmental Variables with Stacking Ensemble Methods
title_short Improving Soil Thickness Estimations Based on Multiple Environmental Variables with Stacking Ensemble Methods
title_sort improving soil thickness estimations based on multiple environmental variables with stacking ensemble methods
topic soil thickness
random forest
extreme gradient boosting
variable selection
machine learning
stacking ensemble method
url https://www.mdpi.com/2072-4292/12/21/3609
work_keys_str_mv AT xinchuanli improvingsoilthicknessestimationsbasedonmultipleenvironmentalvariableswithstackingensemblemethods
AT juhualuo improvingsoilthicknessestimationsbasedonmultipleenvironmentalvariableswithstackingensemblemethods
AT xiuliangjin improvingsoilthicknessestimationsbasedonmultipleenvironmentalvariableswithstackingensemblemethods
AT qiaoninghe improvingsoilthicknessestimationsbasedonmultipleenvironmentalvariableswithstackingensemblemethods
AT yunniu improvingsoilthicknessestimationsbasedonmultipleenvironmentalvariableswithstackingensemblemethods