Comparing Machine Learning Models and Hybrid Geostatistical Methods Using Environmental and Soil Covariates for Soil pH Prediction

In the current paper we assess different machine learning (ML) models and hybrid geostatistical methods in the prediction of soil pH using digital elevation model derivates (environmental covariates) and co-located soil parameters (soil covariates). The study was located in the area of Grevena, Gree...

Full description

Bibliographic Details
Main Authors: Panagiotis Tziachris, Vassilis Aschonitis, Theocharis Chatzistathis, Maria Papadopoulou, Ioannis (John) D. Doukas
Format: Article
Language:English
Published: MDPI AG 2020-04-01
Series:ISPRS International Journal of Geo-Information
Subjects:
Online Access:https://www.mdpi.com/2220-9964/9/4/276
_version_ 1827718196979826688
author Panagiotis Tziachris
Vassilis Aschonitis
Theocharis Chatzistathis
Maria Papadopoulou
Ioannis (John) D. Doukas
author_facet Panagiotis Tziachris
Vassilis Aschonitis
Theocharis Chatzistathis
Maria Papadopoulou
Ioannis (John) D. Doukas
author_sort Panagiotis Tziachris
collection DOAJ
description In the current paper we assess different machine learning (ML) models and hybrid geostatistical methods in the prediction of soil pH using digital elevation model derivates (environmental covariates) and co-located soil parameters (soil covariates). The study was located in the area of Grevena, Greece, where 266 disturbed soil samples were collected from randomly selected locations and analyzed in the laboratory of the Soil and Water Resources Institute. The different models that were assessed were random forests (RF), random forests kriging (RFK), gradient boosting (GB), gradient boosting kriging (GBK), neural networks (NN), and neural networks kriging (NNK) and finally, multiple linear regression (MLR), ordinary kriging (OK), and regression kriging (RK) that although they are not ML models, they were used for comparison reasons. Both the GB and RF models presented the best results in the study, with NN a close second. The introduction of OK to the ML models’ residuals did not have a major impact. Classical geostatistical or hybrid geostatistical methods without ML (OK, MLR, and RK) exhibited worse prediction accuracy compared to the models that included ML. Furthermore, different implementations (methods and packages) of the same ML models were also assessed. Regarding RF and GB, the different implementations that were applied (ranger-ranger, randomForest-rf, xgboost-xgbTree, xgboost-xgbDART) led to similar results, whereas in NN, the differences between the implementations used (nnet-nnet and nnet-avNNet) were more distinct. Finally, ML models tuned through a random search optimization method were compared with the same ML models with their default values. The results showed that the predictions were improved by the optimization process only where the ML algorithms demanded a large number of hyperparameters that needed tuning and there was a significant difference between the default values and the optimized ones, like in the case of GB and NN, but not in RF. In general, the current study concluded that although RF and GB presented approximately the same prediction accuracy, RF had more consistent results, regardless of different packages, different hyperparameter selection methods, or even the inclusion of OK in the ML models’ residuals.
first_indexed 2024-03-10T20:17:56Z
format Article
id doaj.art-fc7a30342f0b4fd296e8540dbd0fe467
institution Directory Open Access Journal
issn 2220-9964
language English
last_indexed 2024-03-10T20:17:56Z
publishDate 2020-04-01
publisher MDPI AG
record_format Article
series ISPRS International Journal of Geo-Information
spelling doaj.art-fc7a30342f0b4fd296e8540dbd0fe4672023-11-19T22:27:54ZengMDPI AGISPRS International Journal of Geo-Information2220-99642020-04-019427610.3390/ijgi9040276Comparing Machine Learning Models and Hybrid Geostatistical Methods Using Environmental and Soil Covariates for Soil pH PredictionPanagiotis Tziachris0Vassilis Aschonitis1Theocharis Chatzistathis2Maria Papadopoulou3Ioannis (John) D. Doukas4Soil and Water Resources Institute, Hellenic Agricultural Organization (H.A.O.)-DEMETER, 570 01 Thessaloniki, GreeceSoil and Water Resources Institute, Hellenic Agricultural Organization (H.A.O.)-DEMETER, 570 01 Thessaloniki, GreeceSoil and Water Resources Institute, Hellenic Agricultural Organization (H.A.O.)-DEMETER, 570 01 Thessaloniki, GreeceDepartment of Cadastre, Photogrammetry and Cartography, Faculty of Engineering, Aristotle University of Thessaloniki (AUTH), 541 24 Thessaloniki, GreeceSchool of Civil Engineering, Faculty of Engineering, Aristotle University of Thessaloniki (AUTH), 541 24 Thessaloniki, GreeceIn the current paper we assess different machine learning (ML) models and hybrid geostatistical methods in the prediction of soil pH using digital elevation model derivates (environmental covariates) and co-located soil parameters (soil covariates). The study was located in the area of Grevena, Greece, where 266 disturbed soil samples were collected from randomly selected locations and analyzed in the laboratory of the Soil and Water Resources Institute. The different models that were assessed were random forests (RF), random forests kriging (RFK), gradient boosting (GB), gradient boosting kriging (GBK), neural networks (NN), and neural networks kriging (NNK) and finally, multiple linear regression (MLR), ordinary kriging (OK), and regression kriging (RK) that although they are not ML models, they were used for comparison reasons. Both the GB and RF models presented the best results in the study, with NN a close second. The introduction of OK to the ML models’ residuals did not have a major impact. Classical geostatistical or hybrid geostatistical methods without ML (OK, MLR, and RK) exhibited worse prediction accuracy compared to the models that included ML. Furthermore, different implementations (methods and packages) of the same ML models were also assessed. Regarding RF and GB, the different implementations that were applied (ranger-ranger, randomForest-rf, xgboost-xgbTree, xgboost-xgbDART) led to similar results, whereas in NN, the differences between the implementations used (nnet-nnet and nnet-avNNet) were more distinct. Finally, ML models tuned through a random search optimization method were compared with the same ML models with their default values. The results showed that the predictions were improved by the optimization process only where the ML algorithms demanded a large number of hyperparameters that needed tuning and there was a significant difference between the default values and the optimized ones, like in the case of GB and NN, but not in RF. In general, the current study concluded that although RF and GB presented approximately the same prediction accuracy, RF had more consistent results, regardless of different packages, different hyperparameter selection methods, or even the inclusion of OK in the ML models’ residuals.https://www.mdpi.com/2220-9964/9/4/276machine learninggeostatisticshybrid geostatistical methodssoil pHenvironmental variables
spellingShingle Panagiotis Tziachris
Vassilis Aschonitis
Theocharis Chatzistathis
Maria Papadopoulou
Ioannis (John) D. Doukas
Comparing Machine Learning Models and Hybrid Geostatistical Methods Using Environmental and Soil Covariates for Soil pH Prediction
ISPRS International Journal of Geo-Information
machine learning
geostatistics
hybrid geostatistical methods
soil pH
environmental variables
title Comparing Machine Learning Models and Hybrid Geostatistical Methods Using Environmental and Soil Covariates for Soil pH Prediction
title_full Comparing Machine Learning Models and Hybrid Geostatistical Methods Using Environmental and Soil Covariates for Soil pH Prediction
title_fullStr Comparing Machine Learning Models and Hybrid Geostatistical Methods Using Environmental and Soil Covariates for Soil pH Prediction
title_full_unstemmed Comparing Machine Learning Models and Hybrid Geostatistical Methods Using Environmental and Soil Covariates for Soil pH Prediction
title_short Comparing Machine Learning Models and Hybrid Geostatistical Methods Using Environmental and Soil Covariates for Soil pH Prediction
title_sort comparing machine learning models and hybrid geostatistical methods using environmental and soil covariates for soil ph prediction
topic machine learning
geostatistics
hybrid geostatistical methods
soil pH
environmental variables
url https://www.mdpi.com/2220-9964/9/4/276
work_keys_str_mv AT panagiotistziachris comparingmachinelearningmodelsandhybridgeostatisticalmethodsusingenvironmentalandsoilcovariatesforsoilphprediction
AT vassilisaschonitis comparingmachinelearningmodelsandhybridgeostatisticalmethodsusingenvironmentalandsoilcovariatesforsoilphprediction
AT theocharischatzistathis comparingmachinelearningmodelsandhybridgeostatisticalmethodsusingenvironmentalandsoilcovariatesforsoilphprediction
AT mariapapadopoulou comparingmachinelearningmodelsandhybridgeostatisticalmethodsusingenvironmentalandsoilcovariatesforsoilphprediction
AT ioannisjohnddoukas comparingmachinelearningmodelsandhybridgeostatisticalmethodsusingenvironmentalandsoilcovariatesforsoilphprediction