Evaluating capabilities of machine learning algorithms for aquatic vegetation classification in temperate wetlands using multi-temporal Sentinel-2 data
Different perspectives use of machine learning (ML) algorithms have proven their performance depends on the quality of reference data. This is particularly true when targets are complex environments, such as wetlands, on which a vast majority of studies are site-specific and based on a single date....
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2023-03-01
|
Series: | International Journal of Applied Earth Observations and Geoinformation |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S1569843223000249 |
_version_ | 1811163478598090752 |
---|---|
author | Erika Piaser Paolo Villa |
author_facet | Erika Piaser Paolo Villa |
author_sort | Erika Piaser |
collection | DOAJ |
description | Different perspectives use of machine learning (ML) algorithms have proven their performance depends on the quality of reference data. This is particularly true when targets are complex environments, such as wetlands, on which a vast majority of studies are site-specific and based on a single date. With this work, an extensive reference dataset of about 400,000 samples was collected, covering nine different sites and multiple seasons, to be considered representative of temperate wetland vegetation communities at continental scale. Starting from this dataset, the performance of selected ML classifiers was compared for detailed wetland vegetation type mapping, using spectral indices (SI) derived from multi-temporal composites of Sentinel-2 as input. Global and per-class accuracy metrics were computed based on four independent training and testing subsets, extracted from the overall dataset, and the impacts of input features variation in number and sites covered were assessed. Our results show a generally higher predictive power for ensemble methods, such as Random Forest (RF) and eXtreme Gradient Boosting (XGBoost), compared to standalone ones, with the notable exception of Support Vector Machine (SVM); the latter in fact, in the algorithm that scored the highest overall accuracy (0.977 ± 0.001) and F-score for all the target classes. Decreasing the number of input features generally resulted in classification accuracy losses, less marked for RF than for SVM, while site-specific algorithms training showed more stability of SVM, thus indicating SVM stronger transferability than RF and XGBoost. |
first_indexed | 2024-04-10T15:06:05Z |
format | Article |
id | doaj.art-1b3c875b8f474c1cb51f1f62d833192c |
institution | Directory Open Access Journal |
issn | 1569-8432 |
language | English |
last_indexed | 2024-04-10T15:06:05Z |
publishDate | 2023-03-01 |
publisher | Elsevier |
record_format | Article |
series | International Journal of Applied Earth Observations and Geoinformation |
spelling | doaj.art-1b3c875b8f474c1cb51f1f62d833192c2023-02-15T04:27:31ZengElsevierInternational Journal of Applied Earth Observations and Geoinformation1569-84322023-03-01117103202Evaluating capabilities of machine learning algorithms for aquatic vegetation classification in temperate wetlands using multi-temporal Sentinel-2 dataErika Piaser0Paolo Villa1Institute for Electromagnetic Sensing of the Environment, National Research Council (IREA-CNR), Milan, Italy; Department of Civil and Environmental Engineering (DICA), Politecnico di Milano, Milan, Italy; Corresponding author at: Institute for Electromagnetic Sensing of the Environment, National Research Council (IREA-CNR), Milan, Italy.Institute for Electromagnetic Sensing of the Environment, National Research Council (IREA-CNR), Milan, ItalyDifferent perspectives use of machine learning (ML) algorithms have proven their performance depends on the quality of reference data. This is particularly true when targets are complex environments, such as wetlands, on which a vast majority of studies are site-specific and based on a single date. With this work, an extensive reference dataset of about 400,000 samples was collected, covering nine different sites and multiple seasons, to be considered representative of temperate wetland vegetation communities at continental scale. Starting from this dataset, the performance of selected ML classifiers was compared for detailed wetland vegetation type mapping, using spectral indices (SI) derived from multi-temporal composites of Sentinel-2 as input. Global and per-class accuracy metrics were computed based on four independent training and testing subsets, extracted from the overall dataset, and the impacts of input features variation in number and sites covered were assessed. Our results show a generally higher predictive power for ensemble methods, such as Random Forest (RF) and eXtreme Gradient Boosting (XGBoost), compared to standalone ones, with the notable exception of Support Vector Machine (SVM); the latter in fact, in the algorithm that scored the highest overall accuracy (0.977 ± 0.001) and F-score for all the target classes. Decreasing the number of input features generally resulted in classification accuracy losses, less marked for RF than for SVM, while site-specific algorithms training showed more stability of SVM, thus indicating SVM stronger transferability than RF and XGBoost.http://www.sciencedirect.com/science/article/pii/S1569843223000249Supervised classificationWetland vegetationSpectral indicesRandom forestSupport vector machine |
spellingShingle | Erika Piaser Paolo Villa Evaluating capabilities of machine learning algorithms for aquatic vegetation classification in temperate wetlands using multi-temporal Sentinel-2 data International Journal of Applied Earth Observations and Geoinformation Supervised classification Wetland vegetation Spectral indices Random forest Support vector machine |
title | Evaluating capabilities of machine learning algorithms for aquatic vegetation classification in temperate wetlands using multi-temporal Sentinel-2 data |
title_full | Evaluating capabilities of machine learning algorithms for aquatic vegetation classification in temperate wetlands using multi-temporal Sentinel-2 data |
title_fullStr | Evaluating capabilities of machine learning algorithms for aquatic vegetation classification in temperate wetlands using multi-temporal Sentinel-2 data |
title_full_unstemmed | Evaluating capabilities of machine learning algorithms for aquatic vegetation classification in temperate wetlands using multi-temporal Sentinel-2 data |
title_short | Evaluating capabilities of machine learning algorithms for aquatic vegetation classification in temperate wetlands using multi-temporal Sentinel-2 data |
title_sort | evaluating capabilities of machine learning algorithms for aquatic vegetation classification in temperate wetlands using multi temporal sentinel 2 data |
topic | Supervised classification Wetland vegetation Spectral indices Random forest Support vector machine |
url | http://www.sciencedirect.com/science/article/pii/S1569843223000249 |
work_keys_str_mv | AT erikapiaser evaluatingcapabilitiesofmachinelearningalgorithmsforaquaticvegetationclassificationintemperatewetlandsusingmultitemporalsentinel2data AT paolovilla evaluatingcapabilitiesofmachinelearningalgorithmsforaquaticvegetationclassificationintemperatewetlandsusingmultitemporalsentinel2data |