Evaluating capabilities of machine learning algorithms for aquatic vegetation classification in temperate wetlands using multi-temporal Sentinel-2 data

Different perspectives use of machine learning (ML) algorithms have proven their performance depends on the quality of reference data. This is particularly true when targets are complex environments, such as wetlands, on which a vast majority of studies are site-specific and based on a single date....

Full description

Bibliographic Details
Main Authors: Erika Piaser, Paolo Villa
Format: Article
Language:English
Published: Elsevier 2023-03-01
Series:International Journal of Applied Earth Observations and Geoinformation
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S1569843223000249
_version_ 1811163478598090752
author Erika Piaser
Paolo Villa
author_facet Erika Piaser
Paolo Villa
author_sort Erika Piaser
collection DOAJ
description Different perspectives use of machine learning (ML) algorithms have proven their performance depends on the quality of reference data. This is particularly true when targets are complex environments, such as wetlands, on which a vast majority of studies are site-specific and based on a single date. With this work, an extensive reference dataset of about 400,000 samples was collected, covering nine different sites and multiple seasons, to be considered representative of temperate wetland vegetation communities at continental scale. Starting from this dataset, the performance of selected ML classifiers was compared for detailed wetland vegetation type mapping, using spectral indices (SI) derived from multi-temporal composites of Sentinel-2 as input. Global and per-class accuracy metrics were computed based on four independent training and testing subsets, extracted from the overall dataset, and the impacts of input features variation in number and sites covered were assessed. Our results show a generally higher predictive power for ensemble methods, such as Random Forest (RF) and eXtreme Gradient Boosting (XGBoost), compared to standalone ones, with the notable exception of Support Vector Machine (SVM); the latter in fact, in the algorithm that scored the highest overall accuracy (0.977 ± 0.001) and F-score for all the target classes. Decreasing the number of input features generally resulted in classification accuracy losses, less marked for RF than for SVM, while site-specific algorithms training showed more stability of SVM, thus indicating SVM stronger transferability than RF and XGBoost.
first_indexed 2024-04-10T15:06:05Z
format Article
id doaj.art-1b3c875b8f474c1cb51f1f62d833192c
institution Directory Open Access Journal
issn 1569-8432
language English
last_indexed 2024-04-10T15:06:05Z
publishDate 2023-03-01
publisher Elsevier
record_format Article
series International Journal of Applied Earth Observations and Geoinformation
spelling doaj.art-1b3c875b8f474c1cb51f1f62d833192c2023-02-15T04:27:31ZengElsevierInternational Journal of Applied Earth Observations and Geoinformation1569-84322023-03-01117103202Evaluating capabilities of machine learning algorithms for aquatic vegetation classification in temperate wetlands using multi-temporal Sentinel-2 dataErika Piaser0Paolo Villa1Institute for Electromagnetic Sensing of the Environment, National Research Council (IREA-CNR), Milan, Italy; Department of Civil and Environmental Engineering (DICA), Politecnico di Milano, Milan, Italy; Corresponding author at: Institute for Electromagnetic Sensing of the Environment, National Research Council (IREA-CNR), Milan, Italy.Institute for Electromagnetic Sensing of the Environment, National Research Council (IREA-CNR), Milan, ItalyDifferent perspectives use of machine learning (ML) algorithms have proven their performance depends on the quality of reference data. This is particularly true when targets are complex environments, such as wetlands, on which a vast majority of studies are site-specific and based on a single date. With this work, an extensive reference dataset of about 400,000 samples was collected, covering nine different sites and multiple seasons, to be considered representative of temperate wetland vegetation communities at continental scale. Starting from this dataset, the performance of selected ML classifiers was compared for detailed wetland vegetation type mapping, using spectral indices (SI) derived from multi-temporal composites of Sentinel-2 as input. Global and per-class accuracy metrics were computed based on four independent training and testing subsets, extracted from the overall dataset, and the impacts of input features variation in number and sites covered were assessed. Our results show a generally higher predictive power for ensemble methods, such as Random Forest (RF) and eXtreme Gradient Boosting (XGBoost), compared to standalone ones, with the notable exception of Support Vector Machine (SVM); the latter in fact, in the algorithm that scored the highest overall accuracy (0.977 ± 0.001) and F-score for all the target classes. Decreasing the number of input features generally resulted in classification accuracy losses, less marked for RF than for SVM, while site-specific algorithms training showed more stability of SVM, thus indicating SVM stronger transferability than RF and XGBoost.http://www.sciencedirect.com/science/article/pii/S1569843223000249Supervised classificationWetland vegetationSpectral indicesRandom forestSupport vector machine
spellingShingle Erika Piaser
Paolo Villa
Evaluating capabilities of machine learning algorithms for aquatic vegetation classification in temperate wetlands using multi-temporal Sentinel-2 data
International Journal of Applied Earth Observations and Geoinformation
Supervised classification
Wetland vegetation
Spectral indices
Random forest
Support vector machine
title Evaluating capabilities of machine learning algorithms for aquatic vegetation classification in temperate wetlands using multi-temporal Sentinel-2 data
title_full Evaluating capabilities of machine learning algorithms for aquatic vegetation classification in temperate wetlands using multi-temporal Sentinel-2 data
title_fullStr Evaluating capabilities of machine learning algorithms for aquatic vegetation classification in temperate wetlands using multi-temporal Sentinel-2 data
title_full_unstemmed Evaluating capabilities of machine learning algorithms for aquatic vegetation classification in temperate wetlands using multi-temporal Sentinel-2 data
title_short Evaluating capabilities of machine learning algorithms for aquatic vegetation classification in temperate wetlands using multi-temporal Sentinel-2 data
title_sort evaluating capabilities of machine learning algorithms for aquatic vegetation classification in temperate wetlands using multi temporal sentinel 2 data
topic Supervised classification
Wetland vegetation
Spectral indices
Random forest
Support vector machine
url http://www.sciencedirect.com/science/article/pii/S1569843223000249
work_keys_str_mv AT erikapiaser evaluatingcapabilitiesofmachinelearningalgorithmsforaquaticvegetationclassificationintemperatewetlandsusingmultitemporalsentinel2data
AT paolovilla evaluatingcapabilitiesofmachinelearningalgorithmsforaquaticvegetationclassificationintemperatewetlandsusingmultitemporalsentinel2data