A Comparative Analysis of Machine Learning Techniques for National Glacier Mapping: Evaluating Performance through Spatial Cross-Validation in Perú
Accurate glacier mapping is crucial for assessing future water security in Andean ecosystems. Traditional accuracy assessment may be biased due to overlooking spatial autocorrelation during map validation. In recent years, spatial cross-validation (CV) strategies have been proposed in environmental...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-12-01
|
Series: | Water |
Subjects: | |
Online Access: | https://www.mdpi.com/2073-4441/15/24/4214 |
_version_ | 1797379076613210112 |
---|---|
author | Marcelo Bueno Briggitte Macera Nilton Montoya |
author_facet | Marcelo Bueno Briggitte Macera Nilton Montoya |
author_sort | Marcelo Bueno |
collection | DOAJ |
description | Accurate glacier mapping is crucial for assessing future water security in Andean ecosystems. Traditional accuracy assessment may be biased due to overlooking spatial autocorrelation during map validation. In recent years, spatial cross-validation (CV) strategies have been proposed in environmental and ecological modeling to reduce bias in predictive accuracy. In this study, we demonstrate the influence of spatial autocorrelation on the accuracy assessment of glacier surface predictive models. This is achieved by comparing the performance of several widely used machine learning algorithms including the gradient-boosting machines (GBM), k-nearest neighbors (KNN), random forest (RF), and logistic regression (LR) for mapping nine main Peruvian glacier regions. Spatial and non-spatial cross-validation methods were used to evaluate the model’s classification errors in terms of the Matthews correlation coefficient. Performance differences of up to 18% were found between bias-reduced (spatial) and overoptimistic (non-spatial) cross-validation results. Regarding only spatial CV, the k-nearest neighbors were the overall best model across Huallanca (0.90), Huayhuasha (0.78), Huaytapallana (0.96), Raura (0.93), Urubamba (0.96), Vilcabamba (0.93), and Vilcanota (0.92) regions, consistently demonstrating the highest performance followed by logistic regression at Blanca (0.95) and Central (0.97) regions. Our validation approach, accounting for spatial characteristics, provides valuable insights for glacier mapping studies and future efforts on glacier retreat monitoring. Incorporating this approach improves the reliability of glacier mapping, guiding future national-level initiatives. |
first_indexed | 2024-03-08T20:16:53Z |
format | Article |
id | doaj.art-33072fb3043a4ec38f6a69bb50c7a36d |
institution | Directory Open Access Journal |
issn | 2073-4441 |
language | English |
last_indexed | 2024-03-08T20:16:53Z |
publishDate | 2023-12-01 |
publisher | MDPI AG |
record_format | Article |
series | Water |
spelling | doaj.art-33072fb3043a4ec38f6a69bb50c7a36d2023-12-22T14:49:42ZengMDPI AGWater2073-44412023-12-011524421410.3390/w15244214A Comparative Analysis of Machine Learning Techniques for National Glacier Mapping: Evaluating Performance through Spatial Cross-Validation in PerúMarcelo Bueno0Briggitte Macera1Nilton Montoya2Departamento Académico de Agricultura, Universidad Nacional de San Antonio Abad del Cusco (UNSAAC), Cusco 08000, PeruDepartamento Académico de Agricultura, Universidad Nacional de San Antonio Abad del Cusco (UNSAAC), Cusco 08000, PeruDepartamento Académico de Agricultura, Universidad Nacional de San Antonio Abad del Cusco (UNSAAC), Cusco 08000, PeruAccurate glacier mapping is crucial for assessing future water security in Andean ecosystems. Traditional accuracy assessment may be biased due to overlooking spatial autocorrelation during map validation. In recent years, spatial cross-validation (CV) strategies have been proposed in environmental and ecological modeling to reduce bias in predictive accuracy. In this study, we demonstrate the influence of spatial autocorrelation on the accuracy assessment of glacier surface predictive models. This is achieved by comparing the performance of several widely used machine learning algorithms including the gradient-boosting machines (GBM), k-nearest neighbors (KNN), random forest (RF), and logistic regression (LR) for mapping nine main Peruvian glacier regions. Spatial and non-spatial cross-validation methods were used to evaluate the model’s classification errors in terms of the Matthews correlation coefficient. Performance differences of up to 18% were found between bias-reduced (spatial) and overoptimistic (non-spatial) cross-validation results. Regarding only spatial CV, the k-nearest neighbors were the overall best model across Huallanca (0.90), Huayhuasha (0.78), Huaytapallana (0.96), Raura (0.93), Urubamba (0.96), Vilcabamba (0.93), and Vilcanota (0.92) regions, consistently demonstrating the highest performance followed by logistic regression at Blanca (0.95) and Central (0.97) regions. Our validation approach, accounting for spatial characteristics, provides valuable insights for glacier mapping studies and future efforts on glacier retreat monitoring. Incorporating this approach improves the reliability of glacier mapping, guiding future national-level initiatives.https://www.mdpi.com/2073-4441/15/24/4214spatial modelingmachine learningglacier mappingglacier retreatclimate changespatial autocorrelation |
spellingShingle | Marcelo Bueno Briggitte Macera Nilton Montoya A Comparative Analysis of Machine Learning Techniques for National Glacier Mapping: Evaluating Performance through Spatial Cross-Validation in Perú Water spatial modeling machine learning glacier mapping glacier retreat climate change spatial autocorrelation |
title | A Comparative Analysis of Machine Learning Techniques for National Glacier Mapping: Evaluating Performance through Spatial Cross-Validation in Perú |
title_full | A Comparative Analysis of Machine Learning Techniques for National Glacier Mapping: Evaluating Performance through Spatial Cross-Validation in Perú |
title_fullStr | A Comparative Analysis of Machine Learning Techniques for National Glacier Mapping: Evaluating Performance through Spatial Cross-Validation in Perú |
title_full_unstemmed | A Comparative Analysis of Machine Learning Techniques for National Glacier Mapping: Evaluating Performance through Spatial Cross-Validation in Perú |
title_short | A Comparative Analysis of Machine Learning Techniques for National Glacier Mapping: Evaluating Performance through Spatial Cross-Validation in Perú |
title_sort | comparative analysis of machine learning techniques for national glacier mapping evaluating performance through spatial cross validation in peru |
topic | spatial modeling machine learning glacier mapping glacier retreat climate change spatial autocorrelation |
url | https://www.mdpi.com/2073-4441/15/24/4214 |
work_keys_str_mv | AT marcelobueno acomparativeanalysisofmachinelearningtechniquesfornationalglaciermappingevaluatingperformancethroughspatialcrossvalidationinperu AT briggittemacera acomparativeanalysisofmachinelearningtechniquesfornationalglaciermappingevaluatingperformancethroughspatialcrossvalidationinperu AT niltonmontoya acomparativeanalysisofmachinelearningtechniquesfornationalglaciermappingevaluatingperformancethroughspatialcrossvalidationinperu AT marcelobueno comparativeanalysisofmachinelearningtechniquesfornationalglaciermappingevaluatingperformancethroughspatialcrossvalidationinperu AT briggittemacera comparativeanalysisofmachinelearningtechniquesfornationalglaciermappingevaluatingperformancethroughspatialcrossvalidationinperu AT niltonmontoya comparativeanalysisofmachinelearningtechniquesfornationalglaciermappingevaluatingperformancethroughspatialcrossvalidationinperu |