Missing data imputation with hybrid feature selection for fertility dataset

Missing values poses a great concern in medical analysis as it may alter the result of analysed data and cloud the judgement of the medical practitioner which ultimately affecting the precise treatment a patient should receive. Even though there are many imputation methods that have been developed...

Full description

Bibliographic Details
Main Authors: Dzulkalnine, Mohamad Faiz, Sallehuddin, Roselina, Mohd. Zain, Azlan, Mohd. Radzi, Nor Haizan, Mustaffa, Noorfa Hazlinna
Format: Article
Published: Academy of Sciences Malaysia 2020
Subjects:
_version_ 1796864961003126784
author Dzulkalnine, Mohamad Faiz
Sallehuddin, Roselina
Mohd. Zain, Azlan
Mohd. Radzi, Nor Haizan
Mustaffa, Noorfa Hazlinna
author_facet Dzulkalnine, Mohamad Faiz
Sallehuddin, Roselina
Mohd. Zain, Azlan
Mohd. Radzi, Nor Haizan
Mustaffa, Noorfa Hazlinna
author_sort Dzulkalnine, Mohamad Faiz
collection ePrints
description Missing values poses a great concern in medical analysis as it may alter the result of analysed data and cloud the judgement of the medical practitioner which ultimately affecting the precise treatment a patient should receive. Even though there are many imputation methods that have been developed, the main issues with missing values such as accuracy and bias in prediction remain unsolved. In this paper, Fuzzy c-means (FCM) is employed as the imputation method. However, FCM does not consider the factor of irrelevant features. Noise and redundant data in the irrelevant features can reduce the accuracy of imputation and increase the computational time of FCM. An approach to tackle this problem is by using a feature selection method. By removing features that are irrelevant, the accuracy of imputation can be increased. Therefore, in this study, a hybrid imputation model Principal Component Analysis-Support Vector Machines-FCM (PCA-SVM-FCM) is proposed. The effectiveness of the proposed model is tested on a medical dataset which is Fertility dataset. Its performance is then validated by comparing it with SVM-FCM. Experimental result demonstrated that the proposed model performs better than SVM-FCM by producing a much lower error in estimation when tested using RMSE and MAE. The proposed model was then further verified by using Thiel’s U test and producing lowU value that indicates it is sufficient and significant. Therefore, PCA-SVM-FCM can be a feasible imputation tool to assist medical practitioner to obtain a reliable and better data analysis result.
first_indexed 2024-03-05T20:49:45Z
format Article
id utm.eprints-90099
institution Universiti Teknologi Malaysia - ePrints
last_indexed 2024-03-05T20:49:45Z
publishDate 2020
publisher Academy of Sciences Malaysia
record_format dspace
spelling utm.eprints-900992021-03-31T06:38:12Z http://eprints.utm.my/90099/ Missing data imputation with hybrid feature selection for fertility dataset Dzulkalnine, Mohamad Faiz Sallehuddin, Roselina Mohd. Zain, Azlan Mohd. Radzi, Nor Haizan Mustaffa, Noorfa Hazlinna QA75 Electronic computers. Computer science Missing values poses a great concern in medical analysis as it may alter the result of analysed data and cloud the judgement of the medical practitioner which ultimately affecting the precise treatment a patient should receive. Even though there are many imputation methods that have been developed, the main issues with missing values such as accuracy and bias in prediction remain unsolved. In this paper, Fuzzy c-means (FCM) is employed as the imputation method. However, FCM does not consider the factor of irrelevant features. Noise and redundant data in the irrelevant features can reduce the accuracy of imputation and increase the computational time of FCM. An approach to tackle this problem is by using a feature selection method. By removing features that are irrelevant, the accuracy of imputation can be increased. Therefore, in this study, a hybrid imputation model Principal Component Analysis-Support Vector Machines-FCM (PCA-SVM-FCM) is proposed. The effectiveness of the proposed model is tested on a medical dataset which is Fertility dataset. Its performance is then validated by comparing it with SVM-FCM. Experimental result demonstrated that the proposed model performs better than SVM-FCM by producing a much lower error in estimation when tested using RMSE and MAE. The proposed model was then further verified by using Thiel’s U test and producing lowU value that indicates it is sufficient and significant. Therefore, PCA-SVM-FCM can be a feasible imputation tool to assist medical practitioner to obtain a reliable and better data analysis result. Academy of Sciences Malaysia 2020-02 Article PeerReviewed Dzulkalnine, Mohamad Faiz and Sallehuddin, Roselina and Mohd. Zain, Azlan and Mohd. Radzi, Nor Haizan and Mustaffa, Noorfa Hazlinna (2020) Missing data imputation with hybrid feature selection for fertility dataset. ASM Science Journal, 13 . pp. 1-6. ISSN 1823-6782 http://dx.doi.org/10.32802/asmscj.2020.sm26(5.23) DOI:10.32802/asmscj.2020.sm26(5.23)
spellingShingle QA75 Electronic computers. Computer science
Dzulkalnine, Mohamad Faiz
Sallehuddin, Roselina
Mohd. Zain, Azlan
Mohd. Radzi, Nor Haizan
Mustaffa, Noorfa Hazlinna
Missing data imputation with hybrid feature selection for fertility dataset
title Missing data imputation with hybrid feature selection for fertility dataset
title_full Missing data imputation with hybrid feature selection for fertility dataset
title_fullStr Missing data imputation with hybrid feature selection for fertility dataset
title_full_unstemmed Missing data imputation with hybrid feature selection for fertility dataset
title_short Missing data imputation with hybrid feature selection for fertility dataset
title_sort missing data imputation with hybrid feature selection for fertility dataset
topic QA75 Electronic computers. Computer science
work_keys_str_mv AT dzulkalninemohamadfaiz missingdataimputationwithhybridfeatureselectionforfertilitydataset
AT sallehuddinroselina missingdataimputationwithhybridfeatureselectionforfertilitydataset
AT mohdzainazlan missingdataimputationwithhybridfeatureselectionforfertilitydataset
AT mohdradzinorhaizan missingdataimputationwithhybridfeatureselectionforfertilitydataset
AT mustaffanoorfahazlinna missingdataimputationwithhybridfeatureselectionforfertilitydataset