Improving Soil Thickness Estimations Based on Multiple Environmental Variables with Stacking Ensemble Methods
Spatially continuous soil thickness data at large scales are usually not readily available and are often difficult and expensive to acquire. Various machine learning algorithms have become very popular in digital soil mapping to predict and map the spatial distribution of soil properties. Identifyin...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2020-11-01
|
Series: | Remote Sensing |
Subjects: | |
Online Access: | https://www.mdpi.com/2072-4292/12/21/3609 |
_version_ | 1797548963757293568 |
---|---|
author | Xinchuan Li Juhua Luo Xiuliang Jin Qiaoning He Yun Niu |
author_facet | Xinchuan Li Juhua Luo Xiuliang Jin Qiaoning He Yun Niu |
author_sort | Xinchuan Li |
collection | DOAJ |
description | Spatially continuous soil thickness data at large scales are usually not readily available and are often difficult and expensive to acquire. Various machine learning algorithms have become very popular in digital soil mapping to predict and map the spatial distribution of soil properties. Identifying the controlling environmental variables of soil thickness and selecting suitable machine learning algorithms are vitally important in modeling. In this study, 11 quantitative and four qualitative environmental variables were selected to explore the main variables that affect soil thickness. Four commonly used machine learning algorithms (multiple linear regression (MLR), support vector regression (SVR), random forest (RF), and extreme gradient boosting (XGBoost) were evaluated as individual models to separately predict and obtain a soil thickness distribution map in Henan Province, China. In addition, the two stacking ensemble models using least absolute shrinkage and selection operator (LASSO) and generalized boosted regression model (GBM) were tested and applied to build the most reliable and accurate estimation model. The results showed that variable selection was a very important part of soil thickness modeling. Topographic wetness index (TWI), slope, elevation, land use and enhanced vegetation index (EVI) were the most influential environmental variables in soil thickness modeling. Comparative results showed that the XGBoost model outperformed the MLR, RF and SVR models. Importantly, the two stacking models achieved higher performance than the single model, especially when using GBM. In terms of accuracy, the proposed stacking method explained 64.0% of the variation for soil thickness. The results of our study provide useful alternative approaches for mapping soil thickness, with potential for use with other soil properties. |
first_indexed | 2024-03-10T15:08:13Z |
format | Article |
id | doaj.art-96cd909fbf994a5292d6afbff6841aa4 |
institution | Directory Open Access Journal |
issn | 2072-4292 |
language | English |
last_indexed | 2024-03-10T15:08:13Z |
publishDate | 2020-11-01 |
publisher | MDPI AG |
record_format | Article |
series | Remote Sensing |
spelling | doaj.art-96cd909fbf994a5292d6afbff6841aa42023-11-20T19:37:40ZengMDPI AGRemote Sensing2072-42922020-11-011221360910.3390/rs12213609Improving Soil Thickness Estimations Based on Multiple Environmental Variables with Stacking Ensemble MethodsXinchuan Li0Juhua Luo1Xiuliang Jin2Qiaoning He3Yun Niu4Key Laboratory of Watershed Geographic Sciences, Nanjing Institute of Geography and Limnology, Chinese Academy of Sciences, Nanjing 210008, ChinaKey Laboratory of Watershed Geographic Sciences, Nanjing Institute of Geography and Limnology, Chinese Academy of Sciences, Nanjing 210008, ChinaInstitute of Crop Sciences, Chinese Academy of Agricultural Sciences/Key Laboratory of Crop Physiology and Ecology, Ministry of Agriculture, Beijing 100081, ChinaSchool of Urban and Environmental Sciences, Huaiyin Normal University, Huai’an 223300, ChinaSchool of Urban and Environmental Sciences, Huaiyin Normal University, Huai’an 223300, ChinaSpatially continuous soil thickness data at large scales are usually not readily available and are often difficult and expensive to acquire. Various machine learning algorithms have become very popular in digital soil mapping to predict and map the spatial distribution of soil properties. Identifying the controlling environmental variables of soil thickness and selecting suitable machine learning algorithms are vitally important in modeling. In this study, 11 quantitative and four qualitative environmental variables were selected to explore the main variables that affect soil thickness. Four commonly used machine learning algorithms (multiple linear regression (MLR), support vector regression (SVR), random forest (RF), and extreme gradient boosting (XGBoost) were evaluated as individual models to separately predict and obtain a soil thickness distribution map in Henan Province, China. In addition, the two stacking ensemble models using least absolute shrinkage and selection operator (LASSO) and generalized boosted regression model (GBM) were tested and applied to build the most reliable and accurate estimation model. The results showed that variable selection was a very important part of soil thickness modeling. Topographic wetness index (TWI), slope, elevation, land use and enhanced vegetation index (EVI) were the most influential environmental variables in soil thickness modeling. Comparative results showed that the XGBoost model outperformed the MLR, RF and SVR models. Importantly, the two stacking models achieved higher performance than the single model, especially when using GBM. In terms of accuracy, the proposed stacking method explained 64.0% of the variation for soil thickness. The results of our study provide useful alternative approaches for mapping soil thickness, with potential for use with other soil properties.https://www.mdpi.com/2072-4292/12/21/3609soil thicknessrandom forestextreme gradient boostingvariable selectionmachine learningstacking ensemble method |
spellingShingle | Xinchuan Li Juhua Luo Xiuliang Jin Qiaoning He Yun Niu Improving Soil Thickness Estimations Based on Multiple Environmental Variables with Stacking Ensemble Methods Remote Sensing soil thickness random forest extreme gradient boosting variable selection machine learning stacking ensemble method |
title | Improving Soil Thickness Estimations Based on Multiple Environmental Variables with Stacking Ensemble Methods |
title_full | Improving Soil Thickness Estimations Based on Multiple Environmental Variables with Stacking Ensemble Methods |
title_fullStr | Improving Soil Thickness Estimations Based on Multiple Environmental Variables with Stacking Ensemble Methods |
title_full_unstemmed | Improving Soil Thickness Estimations Based on Multiple Environmental Variables with Stacking Ensemble Methods |
title_short | Improving Soil Thickness Estimations Based on Multiple Environmental Variables with Stacking Ensemble Methods |
title_sort | improving soil thickness estimations based on multiple environmental variables with stacking ensemble methods |
topic | soil thickness random forest extreme gradient boosting variable selection machine learning stacking ensemble method |
url | https://www.mdpi.com/2072-4292/12/21/3609 |
work_keys_str_mv | AT xinchuanli improvingsoilthicknessestimationsbasedonmultipleenvironmentalvariableswithstackingensemblemethods AT juhualuo improvingsoilthicknessestimationsbasedonmultipleenvironmentalvariableswithstackingensemblemethods AT xiuliangjin improvingsoilthicknessestimationsbasedonmultipleenvironmentalvariableswithstackingensemblemethods AT qiaoninghe improvingsoilthicknessestimationsbasedonmultipleenvironmentalvariableswithstackingensemblemethods AT yunniu improvingsoilthicknessestimationsbasedonmultipleenvironmentalvariableswithstackingensemblemethods |