Systematic review of using machine learning in imputing missing values
Missing data are a universal data quality problem in many domains, leading to misleading analysis and inaccurate decisions. Much research has been done to investigate the different mechanisms of missing data and the proper techniques in handling various data types. In the last decade, machine learni...
Main Authors: | , , , , , , , , , , , , |
---|---|
Format: | Article |
Published: |
Institute of Electrical and Electronics Engineers
2022
|
_version_ | 1825938849427095552 |
---|---|
author | Alabadla, Mustafa Sidi, Fatimah Ishak, Iskandar Ibrahim, Hamidah Affendey, Lilly Suriani Che Ani, Zafienas A. Jabar, Marzanah Bukar, Umar Ali Devaraj, Navin Kumar Muda, Ahmad Sobri Tharek, Anas Omar, Noritah Mohd Jaya, Mohd Izham |
author_facet | Alabadla, Mustafa Sidi, Fatimah Ishak, Iskandar Ibrahim, Hamidah Affendey, Lilly Suriani Che Ani, Zafienas A. Jabar, Marzanah Bukar, Umar Ali Devaraj, Navin Kumar Muda, Ahmad Sobri Tharek, Anas Omar, Noritah Mohd Jaya, Mohd Izham |
author_sort | Alabadla, Mustafa |
collection | UPM |
description | Missing data are a universal data quality problem in many domains, leading to misleading analysis and inaccurate decisions. Much research has been done to investigate the different mechanisms of missing data and the proper techniques in handling various data types. In the last decade, machine learning has been utilized to replace conventional methods to address the problem of missing values more efficiently. By studying and analyzing recently proposed methods using machine learning approaches, vital adoptions in accuracy, performance, and time consumed can be highlighted. This study aimed to help data analysts and researchers address the limitations of machine learning imputation methods by conducting a systematic literature review to provide a comprehensive overview of using such methods to impute missing values. Novel proposed machine learning approaches used for data imputation are analyzed and summarized to assist researchers in selecting a proper machine learning method based on several factors and settings. The review was performed on research studies published between 2016 and 2021 on adopting machine learning to impute missing values, focusing on their strengths and limitations. A total of 684 research articles from various scientific databases were analyzed using search engines, and 94 of them were selected as primary studies. Finally, several recommendations were given to guide future researchers in applying machine learning to impute missing values. |
first_indexed | 2024-03-06T11:18:26Z |
format | Article |
id | upm.eprints-103422 |
institution | Universiti Putra Malaysia |
last_indexed | 2024-03-06T11:18:26Z |
publishDate | 2022 |
publisher | Institute of Electrical and Electronics Engineers |
record_format | dspace |
spelling | upm.eprints-1034222023-06-13T03:01:52Z http://psasir.upm.edu.my/id/eprint/103422/ Systematic review of using machine learning in imputing missing values Alabadla, Mustafa Sidi, Fatimah Ishak, Iskandar Ibrahim, Hamidah Affendey, Lilly Suriani Che Ani, Zafienas A. Jabar, Marzanah Bukar, Umar Ali Devaraj, Navin Kumar Muda, Ahmad Sobri Tharek, Anas Omar, Noritah Mohd Jaya, Mohd Izham Missing data are a universal data quality problem in many domains, leading to misleading analysis and inaccurate decisions. Much research has been done to investigate the different mechanisms of missing data and the proper techniques in handling various data types. In the last decade, machine learning has been utilized to replace conventional methods to address the problem of missing values more efficiently. By studying and analyzing recently proposed methods using machine learning approaches, vital adoptions in accuracy, performance, and time consumed can be highlighted. This study aimed to help data analysts and researchers address the limitations of machine learning imputation methods by conducting a systematic literature review to provide a comprehensive overview of using such methods to impute missing values. Novel proposed machine learning approaches used for data imputation are analyzed and summarized to assist researchers in selecting a proper machine learning method based on several factors and settings. The review was performed on research studies published between 2016 and 2021 on adopting machine learning to impute missing values, focusing on their strengths and limitations. A total of 684 research articles from various scientific databases were analyzed using search engines, and 94 of them were selected as primary studies. Finally, several recommendations were given to guide future researchers in applying machine learning to impute missing values. Institute of Electrical and Electronics Engineers 2022 Article PeerReviewed Alabadla, Mustafa and Sidi, Fatimah and Ishak, Iskandar and Ibrahim, Hamidah and Affendey, Lilly Suriani and Che Ani, Zafienas and A. Jabar, Marzanah and Bukar, Umar Ali and Devaraj, Navin Kumar and Muda, Ahmad Sobri and Tharek, Anas and Omar, Noritah and Mohd Jaya, Mohd Izham (2022) Systematic review of using machine learning in imputing missing values. IEEE Access, 10. 44483 - 44502. ISSN 2169-3536 https://ieeexplore.ieee.org/document/9762231 10.1109/ACCESS.2022.3160841 |
spellingShingle | Alabadla, Mustafa Sidi, Fatimah Ishak, Iskandar Ibrahim, Hamidah Affendey, Lilly Suriani Che Ani, Zafienas A. Jabar, Marzanah Bukar, Umar Ali Devaraj, Navin Kumar Muda, Ahmad Sobri Tharek, Anas Omar, Noritah Mohd Jaya, Mohd Izham Systematic review of using machine learning in imputing missing values |
title | Systematic review of using machine learning in imputing missing values |
title_full | Systematic review of using machine learning in imputing missing values |
title_fullStr | Systematic review of using machine learning in imputing missing values |
title_full_unstemmed | Systematic review of using machine learning in imputing missing values |
title_short | Systematic review of using machine learning in imputing missing values |
title_sort | systematic review of using machine learning in imputing missing values |
work_keys_str_mv | AT alabadlamustafa systematicreviewofusingmachinelearninginimputingmissingvalues AT sidifatimah systematicreviewofusingmachinelearninginimputingmissingvalues AT ishakiskandar systematicreviewofusingmachinelearninginimputingmissingvalues AT ibrahimhamidah systematicreviewofusingmachinelearninginimputingmissingvalues AT affendeylillysuriani systematicreviewofusingmachinelearninginimputingmissingvalues AT cheanizafienas systematicreviewofusingmachinelearninginimputingmissingvalues AT ajabarmarzanah systematicreviewofusingmachinelearninginimputingmissingvalues AT bukarumarali systematicreviewofusingmachinelearninginimputingmissingvalues AT devarajnavinkumar systematicreviewofusingmachinelearninginimputingmissingvalues AT mudaahmadsobri systematicreviewofusingmachinelearninginimputingmissingvalues AT tharekanas systematicreviewofusingmachinelearninginimputingmissingvalues AT omarnoritah systematicreviewofusingmachinelearninginimputingmissingvalues AT mohdjayamohdizham systematicreviewofusingmachinelearninginimputingmissingvalues |