Multi-Objective Evolutionary Simultaneous Feature Selection and Outlier Detection for Regression

When investigating the causes of contamination in specific contexts, such as in underground water wells, multivariate regression is commonly used to establish possible links between the chemical-physical values of the samples and the levels of contaminant. Two issues often arise from such a statisti...

Full description

Bibliographic Details
Main Authors: Fernando Jimenez, Estrella Lucena-Sanchez, Gracia Sanchez, Guido Sciavicco
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9548938/
_version_ 1819116400863608832
author Fernando Jimenez
Estrella Lucena-Sanchez
Gracia Sanchez
Guido Sciavicco
author_facet Fernando Jimenez
Estrella Lucena-Sanchez
Gracia Sanchez
Guido Sciavicco
author_sort Fernando Jimenez
collection DOAJ
description When investigating the causes of contamination in specific contexts, such as in underground water wells, multivariate regression is commonly used to establish possible links between the chemical-physical values of the samples and the levels of contaminant. Two issues often arise from such a statistical analysis: selecting the best predicting variables and detecting the instances that can be suspected to be outliers. In this paper, we propose a comprehensive, integrated, and general optimization model that solves these two problems simultaneously in such a way that outliers can be detected in reference to the specific variables that are selected for the regression, and we implement such an optimization model with a well-known evolutionary algorithm. We test our proposal on data extracted from a project whose aim is to establish the causes of the contamination of underwater water wells in a very specific area of northeastern Italy. The results show that our variable selection and outlier detection algorithm allows the synthesis of very reliable, interpretable, and clean regression models.
first_indexed 2024-12-22T05:16:30Z
format Article
id doaj.art-33800586cbbf472b9318cae82dec3725
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-22T05:16:30Z
publishDate 2021-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-33800586cbbf472b9318cae82dec37252022-12-21T18:37:51ZengIEEEIEEE Access2169-35362021-01-01913567513568810.1109/ACCESS.2021.31158489548938Multi-Objective Evolutionary Simultaneous Feature Selection and Outlier Detection for RegressionFernando Jimenez0https://orcid.org/0000-0001-5844-4163Estrella Lucena-Sanchez1Gracia Sanchez2Guido Sciavicco3https://orcid.org/0000-0002-9221-879XDepartment of Information Engineering and Communications, University of Murcia, Murcia, SpainDepartment of Mathematics and Computer Science, University of Ferrara, Ferrara, ItalyDepartment of Information Engineering and Communications, University of Murcia, Murcia, SpainDepartment of Mathematics and Computer Science, University of Ferrara, Ferrara, ItalyWhen investigating the causes of contamination in specific contexts, such as in underground water wells, multivariate regression is commonly used to establish possible links between the chemical-physical values of the samples and the levels of contaminant. Two issues often arise from such a statistical analysis: selecting the best predicting variables and detecting the instances that can be suspected to be outliers. In this paper, we propose a comprehensive, integrated, and general optimization model that solves these two problems simultaneously in such a way that outliers can be detected in reference to the specific variables that are selected for the regression, and we implement such an optimization model with a well-known evolutionary algorithm. We test our proposal on data extracted from a project whose aim is to establish the causes of the contamination of underwater water wells in a very specific area of northeastern Italy. The results show that our variable selection and outlier detection algorithm allows the synthesis of very reliable, interpretable, and clean regression models.https://ieeexplore.ieee.org/document/9548938/Outlier detectionfeature selectionevolutionary computationmulti-objective optimizationunderground water contamination
spellingShingle Fernando Jimenez
Estrella Lucena-Sanchez
Gracia Sanchez
Guido Sciavicco
Multi-Objective Evolutionary Simultaneous Feature Selection and Outlier Detection for Regression
IEEE Access
Outlier detection
feature selection
evolutionary computation
multi-objective optimization
underground water contamination
title Multi-Objective Evolutionary Simultaneous Feature Selection and Outlier Detection for Regression
title_full Multi-Objective Evolutionary Simultaneous Feature Selection and Outlier Detection for Regression
title_fullStr Multi-Objective Evolutionary Simultaneous Feature Selection and Outlier Detection for Regression
title_full_unstemmed Multi-Objective Evolutionary Simultaneous Feature Selection and Outlier Detection for Regression
title_short Multi-Objective Evolutionary Simultaneous Feature Selection and Outlier Detection for Regression
title_sort multi objective evolutionary simultaneous feature selection and outlier detection for regression
topic Outlier detection
feature selection
evolutionary computation
multi-objective optimization
underground water contamination
url https://ieeexplore.ieee.org/document/9548938/
work_keys_str_mv AT fernandojimenez multiobjectiveevolutionarysimultaneousfeatureselectionandoutlierdetectionforregression
AT estrellalucenasanchez multiobjectiveevolutionarysimultaneousfeatureselectionandoutlierdetectionforregression
AT graciasanchez multiobjectiveevolutionarysimultaneousfeatureselectionandoutlierdetectionforregression
AT guidosciavicco multiobjectiveevolutionarysimultaneousfeatureselectionandoutlierdetectionforregression