Multi-Objective Evolutionary Simultaneous Feature Selection and Outlier Detection for Regression
When investigating the causes of contamination in specific contexts, such as in underground water wells, multivariate regression is commonly used to establish possible links between the chemical-physical values of the samples and the levels of contaminant. Two issues often arise from such a statisti...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2021-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9548938/ |
_version_ | 1819116400863608832 |
---|---|
author | Fernando Jimenez Estrella Lucena-Sanchez Gracia Sanchez Guido Sciavicco |
author_facet | Fernando Jimenez Estrella Lucena-Sanchez Gracia Sanchez Guido Sciavicco |
author_sort | Fernando Jimenez |
collection | DOAJ |
description | When investigating the causes of contamination in specific contexts, such as in underground water wells, multivariate regression is commonly used to establish possible links between the chemical-physical values of the samples and the levels of contaminant. Two issues often arise from such a statistical analysis: selecting the best predicting variables and detecting the instances that can be suspected to be outliers. In this paper, we propose a comprehensive, integrated, and general optimization model that solves these two problems simultaneously in such a way that outliers can be detected in reference to the specific variables that are selected for the regression, and we implement such an optimization model with a well-known evolutionary algorithm. We test our proposal on data extracted from a project whose aim is to establish the causes of the contamination of underwater water wells in a very specific area of northeastern Italy. The results show that our variable selection and outlier detection algorithm allows the synthesis of very reliable, interpretable, and clean regression models. |
first_indexed | 2024-12-22T05:16:30Z |
format | Article |
id | doaj.art-33800586cbbf472b9318cae82dec3725 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-12-22T05:16:30Z |
publishDate | 2021-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-33800586cbbf472b9318cae82dec37252022-12-21T18:37:51ZengIEEEIEEE Access2169-35362021-01-01913567513568810.1109/ACCESS.2021.31158489548938Multi-Objective Evolutionary Simultaneous Feature Selection and Outlier Detection for RegressionFernando Jimenez0https://orcid.org/0000-0001-5844-4163Estrella Lucena-Sanchez1Gracia Sanchez2Guido Sciavicco3https://orcid.org/0000-0002-9221-879XDepartment of Information Engineering and Communications, University of Murcia, Murcia, SpainDepartment of Mathematics and Computer Science, University of Ferrara, Ferrara, ItalyDepartment of Information Engineering and Communications, University of Murcia, Murcia, SpainDepartment of Mathematics and Computer Science, University of Ferrara, Ferrara, ItalyWhen investigating the causes of contamination in specific contexts, such as in underground water wells, multivariate regression is commonly used to establish possible links between the chemical-physical values of the samples and the levels of contaminant. Two issues often arise from such a statistical analysis: selecting the best predicting variables and detecting the instances that can be suspected to be outliers. In this paper, we propose a comprehensive, integrated, and general optimization model that solves these two problems simultaneously in such a way that outliers can be detected in reference to the specific variables that are selected for the regression, and we implement such an optimization model with a well-known evolutionary algorithm. We test our proposal on data extracted from a project whose aim is to establish the causes of the contamination of underwater water wells in a very specific area of northeastern Italy. The results show that our variable selection and outlier detection algorithm allows the synthesis of very reliable, interpretable, and clean regression models.https://ieeexplore.ieee.org/document/9548938/Outlier detectionfeature selectionevolutionary computationmulti-objective optimizationunderground water contamination |
spellingShingle | Fernando Jimenez Estrella Lucena-Sanchez Gracia Sanchez Guido Sciavicco Multi-Objective Evolutionary Simultaneous Feature Selection and Outlier Detection for Regression IEEE Access Outlier detection feature selection evolutionary computation multi-objective optimization underground water contamination |
title | Multi-Objective Evolutionary Simultaneous Feature Selection and Outlier Detection for Regression |
title_full | Multi-Objective Evolutionary Simultaneous Feature Selection and Outlier Detection for Regression |
title_fullStr | Multi-Objective Evolutionary Simultaneous Feature Selection and Outlier Detection for Regression |
title_full_unstemmed | Multi-Objective Evolutionary Simultaneous Feature Selection and Outlier Detection for Regression |
title_short | Multi-Objective Evolutionary Simultaneous Feature Selection and Outlier Detection for Regression |
title_sort | multi objective evolutionary simultaneous feature selection and outlier detection for regression |
topic | Outlier detection feature selection evolutionary computation multi-objective optimization underground water contamination |
url | https://ieeexplore.ieee.org/document/9548938/ |
work_keys_str_mv | AT fernandojimenez multiobjectiveevolutionarysimultaneousfeatureselectionandoutlierdetectionforregression AT estrellalucenasanchez multiobjectiveevolutionarysimultaneousfeatureselectionandoutlierdetectionforregression AT graciasanchez multiobjectiveevolutionarysimultaneousfeatureselectionandoutlierdetectionforregression AT guidosciavicco multiobjectiveevolutionarysimultaneousfeatureselectionandoutlierdetectionforregression |