Handling Missing Values Based on Similarity Classifiers and Fuzzy Entropy Measures

Handling missing values (MVs) and feature selection (FS) are vital preprocessing tasks for many pattern recognition, data mining, and machine learning (ML) applications, involving classification and regression problems. The existence of MVs in data badly affects making decisions. Hence, MVs have to...

Full description

Bibliographic Details
Main Authors: Faten Khalid Karim, Hela Elmannai, Abdelrahman Seleem, Safwat Hamad, Samih M. Mostafa
Format: Article
Language:English
Published: MDPI AG 2022-11-01
Series:Electronics
Subjects:
Online Access:https://www.mdpi.com/2079-9292/11/23/3929
_version_ 1797463416375345152
author Faten Khalid Karim
Hela Elmannai
Abdelrahman Seleem
Safwat Hamad
Samih M. Mostafa
author_facet Faten Khalid Karim
Hela Elmannai
Abdelrahman Seleem
Safwat Hamad
Samih M. Mostafa
author_sort Faten Khalid Karim
collection DOAJ
description Handling missing values (MVs) and feature selection (FS) are vital preprocessing tasks for many pattern recognition, data mining, and machine learning (ML) applications, involving classification and regression problems. The existence of MVs in data badly affects making decisions. Hence, MVs have to be taken into consideration during preprocessing tasks as a critical problem. To this end, the authors proposed a new algorithm for manipulating MVs using FS. Bayesian ridge regression (BRR) is the most beneficial type of Bayesian regression. BRR estimates a probabilistic model of the regression problem. The proposed algorithm is dubbed as cumulative Bayesian ridge with similarity and Luca’s fuzzy entropy measure (CBRSL). CBRSL reveals how the fuzzy entropy FS used for selecting the candidate feature holding MVs aids in the prediction of the MVs within the selected feature using the Bayesian Ridge technique. CBRSL can be utilized to manipulate MVs within other features in a cumulative order; the filled features are incorporated within the BRR equation in order to predict the MVs for the next selected incomplete feature. An experimental analysis was conducted on four datasets holding MVs generated from three missingness mechanisms to compare CBRSL with state-of-the-art practical imputation methods. The performance was measured in terms of R<sup>2</sup> score (determination coefficient), RMSE (root mean square error), and MAE (mean absolute error). Experimental results indicate that the accuracy and execution times differ depending on the amount of MVs, the dataset’s size, and the mechanism type of missingness. In addition, the results show that CBRSL can manipulate MVs generated from any missingness mechanism with a competitive accuracy against the compared methods.
first_indexed 2024-03-09T17:50:23Z
format Article
id doaj.art-6f8bf6e733574dd59cb45938c6ec4693
institution Directory Open Access Journal
issn 2079-9292
language English
last_indexed 2024-03-09T17:50:23Z
publishDate 2022-11-01
publisher MDPI AG
record_format Article
series Electronics
spelling doaj.art-6f8bf6e733574dd59cb45938c6ec46932023-11-24T10:47:52ZengMDPI AGElectronics2079-92922022-11-011123392910.3390/electronics11233929Handling Missing Values Based on Similarity Classifiers and Fuzzy Entropy MeasuresFaten Khalid Karim0Hela Elmannai1Abdelrahman Seleem2Safwat Hamad3Samih M. Mostafa4Department of Computer Sciences, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi ArabiaDepartment of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi ArabiaComputer Science Department, Faculty of Computers and Information, South Valley University, Qena 83523, EgyptScientific Computing Department, Faculty of Computer and Information Sciences, Ain Shams University, Cairo 11566, EgyptComputer Science Department, Faculty of Computers and Information, South Valley University, Qena 83523, EgyptHandling missing values (MVs) and feature selection (FS) are vital preprocessing tasks for many pattern recognition, data mining, and machine learning (ML) applications, involving classification and regression problems. The existence of MVs in data badly affects making decisions. Hence, MVs have to be taken into consideration during preprocessing tasks as a critical problem. To this end, the authors proposed a new algorithm for manipulating MVs using FS. Bayesian ridge regression (BRR) is the most beneficial type of Bayesian regression. BRR estimates a probabilistic model of the regression problem. The proposed algorithm is dubbed as cumulative Bayesian ridge with similarity and Luca’s fuzzy entropy measure (CBRSL). CBRSL reveals how the fuzzy entropy FS used for selecting the candidate feature holding MVs aids in the prediction of the MVs within the selected feature using the Bayesian Ridge technique. CBRSL can be utilized to manipulate MVs within other features in a cumulative order; the filled features are incorporated within the BRR equation in order to predict the MVs for the next selected incomplete feature. An experimental analysis was conducted on four datasets holding MVs generated from three missingness mechanisms to compare CBRSL with state-of-the-art practical imputation methods. The performance was measured in terms of R<sup>2</sup> score (determination coefficient), RMSE (root mean square error), and MAE (mean absolute error). Experimental results indicate that the accuracy and execution times differ depending on the amount of MVs, the dataset’s size, and the mechanism type of missingness. In addition, the results show that CBRSL can manipulate MVs generated from any missingness mechanism with a competitive accuracy against the compared methods.https://www.mdpi.com/2079-9292/11/23/3929missingness mechanismsfeature selectionbayesian ridge regressionimputationsimilarity classifier
spellingShingle Faten Khalid Karim
Hela Elmannai
Abdelrahman Seleem
Safwat Hamad
Samih M. Mostafa
Handling Missing Values Based on Similarity Classifiers and Fuzzy Entropy Measures
Electronics
missingness mechanisms
feature selection
bayesian ridge regression
imputation
similarity classifier
title Handling Missing Values Based on Similarity Classifiers and Fuzzy Entropy Measures
title_full Handling Missing Values Based on Similarity Classifiers and Fuzzy Entropy Measures
title_fullStr Handling Missing Values Based on Similarity Classifiers and Fuzzy Entropy Measures
title_full_unstemmed Handling Missing Values Based on Similarity Classifiers and Fuzzy Entropy Measures
title_short Handling Missing Values Based on Similarity Classifiers and Fuzzy Entropy Measures
title_sort handling missing values based on similarity classifiers and fuzzy entropy measures
topic missingness mechanisms
feature selection
bayesian ridge regression
imputation
similarity classifier
url https://www.mdpi.com/2079-9292/11/23/3929
work_keys_str_mv AT fatenkhalidkarim handlingmissingvaluesbasedonsimilarityclassifiersandfuzzyentropymeasures
AT helaelmannai handlingmissingvaluesbasedonsimilarityclassifiersandfuzzyentropymeasures
AT abdelrahmanseleem handlingmissingvaluesbasedonsimilarityclassifiersandfuzzyentropymeasures
AT safwathamad handlingmissingvaluesbasedonsimilarityclassifiersandfuzzyentropymeasures
AT samihmmostafa handlingmissingvaluesbasedonsimilarityclassifiersandfuzzyentropymeasures