Monitoring Forest Health Using Hyperspectral Imagery: Does Feature Selection Improve the Performance of Machine-Learning Techniques?

This study analyzed highly correlated, feature-rich datasets from hyperspectral remote sensing data using multiple statistical and machine-learning methods. The effect of filter-based feature selection methods on predictive performance was compared. In addition, the effect of multiple expert-based a...

Full description

Bibliographic Details
Main Authors: Patrick Schratz, Jannes Muenchow, Eugenia Iturritxa, José Cortés, Bernd Bischl, Alexander Brenning
Format: Article
Language:English
Published: MDPI AG 2021-11-01
Series:Remote Sensing
Subjects:
Online Access:https://www.mdpi.com/2072-4292/13/23/4832
_version_ 1797507281383849984
author Patrick Schratz
Jannes Muenchow
Eugenia Iturritxa
José Cortés
Bernd Bischl
Alexander Brenning
author_facet Patrick Schratz
Jannes Muenchow
Eugenia Iturritxa
José Cortés
Bernd Bischl
Alexander Brenning
author_sort Patrick Schratz
collection DOAJ
description This study analyzed highly correlated, feature-rich datasets from hyperspectral remote sensing data using multiple statistical and machine-learning methods. The effect of filter-based feature selection methods on predictive performance was compared. In addition, the effect of multiple expert-based and data-driven feature sets, derived from the reflectance data, was investigated. Defoliation of trees (%), derived from in situ measurements from fall 2016, was modeled as a function of reflectance. Variable importance was assessed using permutation-based feature importance. Overall, the support vector machine (SVM) outperformed other algorithms, such as random forest (RF), extreme gradient boosting (XGBoost), and lasso (L1) and ridge (L2) regressions by at least three percentage points. The combination of certain feature sets showed small increases in predictive performance, while no substantial differences between individual feature sets were observed. For some combinations of learners and feature sets, filter methods achieved better predictive performances than using no feature selection. Ensemble filters did not have a substantial impact on performance. The most important features were located around the red edge. Additional features in the near-infrared region (800–1000 nm) were also essential to achieve the overall best performances. Filter methods have the potential to be helpful in high-dimensional situations and are able to improve the interpretation of feature effects in fitted models, which is an essential constraint in environmental modeling studies. Nevertheless, more training data and replication in similar benchmarking studies are needed to be able to generalize the results.
first_indexed 2024-03-10T04:46:20Z
format Article
id doaj.art-50d9c3d870724ecab7501cd9a214b24b
institution Directory Open Access Journal
issn 2072-4292
language English
last_indexed 2024-03-10T04:46:20Z
publishDate 2021-11-01
publisher MDPI AG
record_format Article
series Remote Sensing
spelling doaj.art-50d9c3d870724ecab7501cd9a214b24b2023-11-23T02:57:03ZengMDPI AGRemote Sensing2072-42922021-11-011323483210.3390/rs13234832Monitoring Forest Health Using Hyperspectral Imagery: Does Feature Selection Improve the Performance of Machine-Learning Techniques?Patrick Schratz0Jannes Muenchow1Eugenia Iturritxa2José Cortés3Bernd Bischl4Alexander Brenning5GIScience Group, Department of Geography, Friedrich Schiller University Jena, Loebdergraben 32, 07743 Jena, GermanyGIScience Group, Department of Geography, Friedrich Schiller University Jena, Loebdergraben 32, 07743 Jena, GermanyNEIKER Tecnalia, 48160 Tecnalia, SpainGIScience Group, Department of Geography, Friedrich Schiller University Jena, Loebdergraben 32, 07743 Jena, GermanyDepartment of Statistics, Ludwig-Maximilians-Universität München, Akademiestrasse 1/I, 80799 Munich, GermanyGIScience Group, Department of Geography, Friedrich Schiller University Jena, Loebdergraben 32, 07743 Jena, GermanyThis study analyzed highly correlated, feature-rich datasets from hyperspectral remote sensing data using multiple statistical and machine-learning methods. The effect of filter-based feature selection methods on predictive performance was compared. In addition, the effect of multiple expert-based and data-driven feature sets, derived from the reflectance data, was investigated. Defoliation of trees (%), derived from in situ measurements from fall 2016, was modeled as a function of reflectance. Variable importance was assessed using permutation-based feature importance. Overall, the support vector machine (SVM) outperformed other algorithms, such as random forest (RF), extreme gradient boosting (XGBoost), and lasso (L1) and ridge (L2) regressions by at least three percentage points. The combination of certain feature sets showed small increases in predictive performance, while no substantial differences between individual feature sets were observed. For some combinations of learners and feature sets, filter methods achieved better predictive performances than using no feature selection. Ensemble filters did not have a substantial impact on performance. The most important features were located around the red edge. Additional features in the near-infrared region (800–1000 nm) were also essential to achieve the overall best performances. Filter methods have the potential to be helpful in high-dimensional situations and are able to improve the interpretation of feature effects in fitted models, which is an essential constraint in environmental modeling studies. Nevertheless, more training data and replication in similar benchmarking studies are needed to be able to generalize the results.https://www.mdpi.com/2072-4292/13/23/4832hyperspectral imageryforest health monitoringmachine learningfeature selectionmodel comparison
spellingShingle Patrick Schratz
Jannes Muenchow
Eugenia Iturritxa
José Cortés
Bernd Bischl
Alexander Brenning
Monitoring Forest Health Using Hyperspectral Imagery: Does Feature Selection Improve the Performance of Machine-Learning Techniques?
Remote Sensing
hyperspectral imagery
forest health monitoring
machine learning
feature selection
model comparison
title Monitoring Forest Health Using Hyperspectral Imagery: Does Feature Selection Improve the Performance of Machine-Learning Techniques?
title_full Monitoring Forest Health Using Hyperspectral Imagery: Does Feature Selection Improve the Performance of Machine-Learning Techniques?
title_fullStr Monitoring Forest Health Using Hyperspectral Imagery: Does Feature Selection Improve the Performance of Machine-Learning Techniques?
title_full_unstemmed Monitoring Forest Health Using Hyperspectral Imagery: Does Feature Selection Improve the Performance of Machine-Learning Techniques?
title_short Monitoring Forest Health Using Hyperspectral Imagery: Does Feature Selection Improve the Performance of Machine-Learning Techniques?
title_sort monitoring forest health using hyperspectral imagery does feature selection improve the performance of machine learning techniques
topic hyperspectral imagery
forest health monitoring
machine learning
feature selection
model comparison
url https://www.mdpi.com/2072-4292/13/23/4832
work_keys_str_mv AT patrickschratz monitoringforesthealthusinghyperspectralimagerydoesfeatureselectionimprovetheperformanceofmachinelearningtechniques
AT jannesmuenchow monitoringforesthealthusinghyperspectralimagerydoesfeatureselectionimprovetheperformanceofmachinelearningtechniques
AT eugeniaiturritxa monitoringforesthealthusinghyperspectralimagerydoesfeatureselectionimprovetheperformanceofmachinelearningtechniques
AT josecortes monitoringforesthealthusinghyperspectralimagerydoesfeatureselectionimprovetheperformanceofmachinelearningtechniques
AT berndbischl monitoringforesthealthusinghyperspectralimagerydoesfeatureselectionimprovetheperformanceofmachinelearningtechniques
AT alexanderbrenning monitoringforesthealthusinghyperspectralimagerydoesfeatureselectionimprovetheperformanceofmachinelearningtechniques