CBRG: A Novel Algorithm for Handling Missing Data Using Bayesian Ridge Regression and Feature Selection Based on Gain Ratio

Existing imputation methods may lead to biased predictions and decrease or increase the statistical influence which leads to improper estimations. Several missing value imputation approaches performance depends on the size of the dataset and the number of missing values within the dataset. In this w...

Full description

Bibliographic Details
Main Authors: Samih M. Mostafa, Abdelrahman S. Eladimy, Safwat Hamad, Hirofumi Amano
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9277540/
_version_ 1811209728813957120
author Samih M. Mostafa
Abdelrahman S. Eladimy
Safwat Hamad
Hirofumi Amano
author_facet Samih M. Mostafa
Abdelrahman S. Eladimy
Safwat Hamad
Hirofumi Amano
author_sort Samih M. Mostafa
collection DOAJ
description Existing imputation methods may lead to biased predictions and decrease or increase the statistical influence which leads to improper estimations. Several missing value imputation approaches performance depends on the size of the dataset and the number of missing values within the dataset. In this work, the authors proposed a novel algorithm for manipulating missing data versus some common imputation approaches. The proposed algorithm imputes missing values in cumulative order depending on the gain ratio (GR) feature selection (to select the candidate feature to be manipulated) and the Bayesian Ridge Regression (BRR) technique (to build the predictive model). Each imputed feature will be used to manipulate the missing values in the following selected candidate feature. The proposed algorithm was implemented on eight different datasets after generating different missing values proportions from the missingness mechanisms. The imputation performance was calculated in terms of imputation time, mean absolute error (MAE), coefficient of determination (<inline-formula> <tex-math notation="LaTeX">$R^{2}$ </tex-math></inline-formula>), and root-mean-square error (RMSE). The results show the efficiency of the proposed algorithm when imputing any dataset with any number of missing data from any missingness mechanism.
first_indexed 2024-04-12T04:45:32Z
format Article
id doaj.art-2b1b59e2998b4f69be67d7af9e7ae634
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-04-12T04:45:32Z
publishDate 2020-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-2b1b59e2998b4f69be67d7af9e7ae6342022-12-22T03:47:31ZengIEEEIEEE Access2169-35362020-01-01821696921698510.1109/ACCESS.2020.30421199277540CBRG: A Novel Algorithm for Handling Missing Data Using Bayesian Ridge Regression and Feature Selection Based on Gain RatioSamih M. Mostafa0https://orcid.org/0000-0001-9234-5898Abdelrahman S. Eladimy1https://orcid.org/0000-0003-3254-0872Safwat Hamad2https://orcid.org/0000-0002-1338-8724Hirofumi Amano3https://orcid.org/0000-0002-8187-4337Computer Science-Mathematics Department, Faculty of Science, South Valley University, Qena, EgyptComputer Science-Mathematics Department, Faculty of Science, South Valley University, Qena, EgyptScientific Computing Department, Faculty of Computer and Information Sciences, Ain Shams University, Cairo, EgyptResearch Institute for Information Technology, Kyushu University, Fukuoka, JapanExisting imputation methods may lead to biased predictions and decrease or increase the statistical influence which leads to improper estimations. Several missing value imputation approaches performance depends on the size of the dataset and the number of missing values within the dataset. In this work, the authors proposed a novel algorithm for manipulating missing data versus some common imputation approaches. The proposed algorithm imputes missing values in cumulative order depending on the gain ratio (GR) feature selection (to select the candidate feature to be manipulated) and the Bayesian Ridge Regression (BRR) technique (to build the predictive model). Each imputed feature will be used to manipulate the missing values in the following selected candidate feature. The proposed algorithm was implemented on eight different datasets after generating different missing values proportions from the missingness mechanisms. The imputation performance was calculated in terms of imputation time, mean absolute error (MAE), coefficient of determination (<inline-formula> <tex-math notation="LaTeX">$R^{2}$ </tex-math></inline-formula>), and root-mean-square error (RMSE). The results show the efficiency of the proposed algorithm when imputing any dataset with any number of missing data from any missingness mechanism.https://ieeexplore.ieee.org/document/9277540/Bayesian ridge regressionimputationgain ratiomissingness mechanismsmissing value
spellingShingle Samih M. Mostafa
Abdelrahman S. Eladimy
Safwat Hamad
Hirofumi Amano
CBRG: A Novel Algorithm for Handling Missing Data Using Bayesian Ridge Regression and Feature Selection Based on Gain Ratio
IEEE Access
Bayesian ridge regression
imputation
gain ratio
missingness mechanisms
missing value
title CBRG: A Novel Algorithm for Handling Missing Data Using Bayesian Ridge Regression and Feature Selection Based on Gain Ratio
title_full CBRG: A Novel Algorithm for Handling Missing Data Using Bayesian Ridge Regression and Feature Selection Based on Gain Ratio
title_fullStr CBRG: A Novel Algorithm for Handling Missing Data Using Bayesian Ridge Regression and Feature Selection Based on Gain Ratio
title_full_unstemmed CBRG: A Novel Algorithm for Handling Missing Data Using Bayesian Ridge Regression and Feature Selection Based on Gain Ratio
title_short CBRG: A Novel Algorithm for Handling Missing Data Using Bayesian Ridge Regression and Feature Selection Based on Gain Ratio
title_sort cbrg a novel algorithm for handling missing data using bayesian ridge regression and feature selection based on gain ratio
topic Bayesian ridge regression
imputation
gain ratio
missingness mechanisms
missing value
url https://ieeexplore.ieee.org/document/9277540/
work_keys_str_mv AT samihmmostafa cbrganovelalgorithmforhandlingmissingdatausingbayesianridgeregressionandfeatureselectionbasedongainratio
AT abdelrahmanseladimy cbrganovelalgorithmforhandlingmissingdatausingbayesianridgeregressionandfeatureselectionbasedongainratio
AT safwathamad cbrganovelalgorithmforhandlingmissingdatausingbayesianridgeregressionandfeatureselectionbasedongainratio
AT hirofumiamano cbrganovelalgorithmforhandlingmissingdatausingbayesianridgeregressionandfeatureselectionbasedongainratio