A Comparative Analysis of Machine Learning Techniques for National Glacier Mapping: Evaluating Performance through Spatial Cross-Validation in Perú

Accurate glacier mapping is crucial for assessing future water security in Andean ecosystems. Traditional accuracy assessment may be biased due to overlooking spatial autocorrelation during map validation. In recent years, spatial cross-validation (CV) strategies have been proposed in environmental...

Full description

Bibliographic Details
Main Authors: Marcelo Bueno, Briggitte Macera, Nilton Montoya
Format: Article
Language:English
Published: MDPI AG 2023-12-01
Series:Water
Subjects:
Online Access:https://www.mdpi.com/2073-4441/15/24/4214
_version_ 1797379076613210112
author Marcelo Bueno
Briggitte Macera
Nilton Montoya
author_facet Marcelo Bueno
Briggitte Macera
Nilton Montoya
author_sort Marcelo Bueno
collection DOAJ
description Accurate glacier mapping is crucial for assessing future water security in Andean ecosystems. Traditional accuracy assessment may be biased due to overlooking spatial autocorrelation during map validation. In recent years, spatial cross-validation (CV) strategies have been proposed in environmental and ecological modeling to reduce bias in predictive accuracy. In this study, we demonstrate the influence of spatial autocorrelation on the accuracy assessment of glacier surface predictive models. This is achieved by comparing the performance of several widely used machine learning algorithms including the gradient-boosting machines (GBM), k-nearest neighbors (KNN), random forest (RF), and logistic regression (LR) for mapping nine main Peruvian glacier regions. Spatial and non-spatial cross-validation methods were used to evaluate the model’s classification errors in terms of the Matthews correlation coefficient. Performance differences of up to 18% were found between bias-reduced (spatial) and overoptimistic (non-spatial) cross-validation results. Regarding only spatial CV, the k-nearest neighbors were the overall best model across Huallanca (0.90), Huayhuasha (0.78), Huaytapallana (0.96), Raura (0.93), Urubamba (0.96), Vilcabamba (0.93), and Vilcanota (0.92) regions, consistently demonstrating the highest performance followed by logistic regression at Blanca (0.95) and Central (0.97) regions. Our validation approach, accounting for spatial characteristics, provides valuable insights for glacier mapping studies and future efforts on glacier retreat monitoring. Incorporating this approach improves the reliability of glacier mapping, guiding future national-level initiatives.
first_indexed 2024-03-08T20:16:53Z
format Article
id doaj.art-33072fb3043a4ec38f6a69bb50c7a36d
institution Directory Open Access Journal
issn 2073-4441
language English
last_indexed 2024-03-08T20:16:53Z
publishDate 2023-12-01
publisher MDPI AG
record_format Article
series Water
spelling doaj.art-33072fb3043a4ec38f6a69bb50c7a36d2023-12-22T14:49:42ZengMDPI AGWater2073-44412023-12-011524421410.3390/w15244214A Comparative Analysis of Machine Learning Techniques for National Glacier Mapping: Evaluating Performance through Spatial Cross-Validation in PerúMarcelo Bueno0Briggitte Macera1Nilton Montoya2Departamento Académico de Agricultura, Universidad Nacional de San Antonio Abad del Cusco (UNSAAC), Cusco 08000, PeruDepartamento Académico de Agricultura, Universidad Nacional de San Antonio Abad del Cusco (UNSAAC), Cusco 08000, PeruDepartamento Académico de Agricultura, Universidad Nacional de San Antonio Abad del Cusco (UNSAAC), Cusco 08000, PeruAccurate glacier mapping is crucial for assessing future water security in Andean ecosystems. Traditional accuracy assessment may be biased due to overlooking spatial autocorrelation during map validation. In recent years, spatial cross-validation (CV) strategies have been proposed in environmental and ecological modeling to reduce bias in predictive accuracy. In this study, we demonstrate the influence of spatial autocorrelation on the accuracy assessment of glacier surface predictive models. This is achieved by comparing the performance of several widely used machine learning algorithms including the gradient-boosting machines (GBM), k-nearest neighbors (KNN), random forest (RF), and logistic regression (LR) for mapping nine main Peruvian glacier regions. Spatial and non-spatial cross-validation methods were used to evaluate the model’s classification errors in terms of the Matthews correlation coefficient. Performance differences of up to 18% were found between bias-reduced (spatial) and overoptimistic (non-spatial) cross-validation results. Regarding only spatial CV, the k-nearest neighbors were the overall best model across Huallanca (0.90), Huayhuasha (0.78), Huaytapallana (0.96), Raura (0.93), Urubamba (0.96), Vilcabamba (0.93), and Vilcanota (0.92) regions, consistently demonstrating the highest performance followed by logistic regression at Blanca (0.95) and Central (0.97) regions. Our validation approach, accounting for spatial characteristics, provides valuable insights for glacier mapping studies and future efforts on glacier retreat monitoring. Incorporating this approach improves the reliability of glacier mapping, guiding future national-level initiatives.https://www.mdpi.com/2073-4441/15/24/4214spatial modelingmachine learningglacier mappingglacier retreatclimate changespatial autocorrelation
spellingShingle Marcelo Bueno
Briggitte Macera
Nilton Montoya
A Comparative Analysis of Machine Learning Techniques for National Glacier Mapping: Evaluating Performance through Spatial Cross-Validation in Perú
Water
spatial modeling
machine learning
glacier mapping
glacier retreat
climate change
spatial autocorrelation
title A Comparative Analysis of Machine Learning Techniques for National Glacier Mapping: Evaluating Performance through Spatial Cross-Validation in Perú
title_full A Comparative Analysis of Machine Learning Techniques for National Glacier Mapping: Evaluating Performance through Spatial Cross-Validation in Perú
title_fullStr A Comparative Analysis of Machine Learning Techniques for National Glacier Mapping: Evaluating Performance through Spatial Cross-Validation in Perú
title_full_unstemmed A Comparative Analysis of Machine Learning Techniques for National Glacier Mapping: Evaluating Performance through Spatial Cross-Validation in Perú
title_short A Comparative Analysis of Machine Learning Techniques for National Glacier Mapping: Evaluating Performance through Spatial Cross-Validation in Perú
title_sort comparative analysis of machine learning techniques for national glacier mapping evaluating performance through spatial cross validation in peru
topic spatial modeling
machine learning
glacier mapping
glacier retreat
climate change
spatial autocorrelation
url https://www.mdpi.com/2073-4441/15/24/4214
work_keys_str_mv AT marcelobueno acomparativeanalysisofmachinelearningtechniquesfornationalglaciermappingevaluatingperformancethroughspatialcrossvalidationinperu
AT briggittemacera acomparativeanalysisofmachinelearningtechniquesfornationalglaciermappingevaluatingperformancethroughspatialcrossvalidationinperu
AT niltonmontoya acomparativeanalysisofmachinelearningtechniquesfornationalglaciermappingevaluatingperformancethroughspatialcrossvalidationinperu
AT marcelobueno comparativeanalysisofmachinelearningtechniquesfornationalglaciermappingevaluatingperformancethroughspatialcrossvalidationinperu
AT briggittemacera comparativeanalysisofmachinelearningtechniquesfornationalglaciermappingevaluatingperformancethroughspatialcrossvalidationinperu
AT niltonmontoya comparativeanalysisofmachinelearningtechniquesfornationalglaciermappingevaluatingperformancethroughspatialcrossvalidationinperu