Summary: | Different perspectives use of machine learning (ML) algorithms have proven their performance depends on the quality of reference data. This is particularly true when targets are complex environments, such as wetlands, on which a vast majority of studies are site-specific and based on a single date. With this work, an extensive reference dataset of about 400,000 samples was collected, covering nine different sites and multiple seasons, to be considered representative of temperate wetland vegetation communities at continental scale. Starting from this dataset, the performance of selected ML classifiers was compared for detailed wetland vegetation type mapping, using spectral indices (SI) derived from multi-temporal composites of Sentinel-2 as input. Global and per-class accuracy metrics were computed based on four independent training and testing subsets, extracted from the overall dataset, and the impacts of input features variation in number and sites covered were assessed. Our results show a generally higher predictive power for ensemble methods, such as Random Forest (RF) and eXtreme Gradient Boosting (XGBoost), compared to standalone ones, with the notable exception of Support Vector Machine (SVM); the latter in fact, in the algorithm that scored the highest overall accuracy (0.977 ± 0.001) and F-score for all the target classes. Decreasing the number of input features generally resulted in classification accuracy losses, less marked for RF than for SVM, while site-specific algorithms training showed more stability of SVM, thus indicating SVM stronger transferability than RF and XGBoost.
|