Iterative Robust Semi-Supervised Missing Data Imputation
In many real-world applications scientists are often confronted with the problem of incomplete datasets due to several reasons. The direct analysis of datasets with missing values in attributes inevitably results in inaccurate learning models and erroneous results. Facing effectively the challenge o...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2020-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9091515/ |
_version_ | 1828406329935396864 |
---|---|
author | Nikos Fazakis Georgios Kostopoulos Sotiris Kotsiantis Iosif Mporas |
author_facet | Nikos Fazakis Georgios Kostopoulos Sotiris Kotsiantis Iosif Mporas |
author_sort | Nikos Fazakis |
collection | DOAJ |
description | In many real-world applications scientists are often confronted with the problem of incomplete datasets due to several reasons. The direct analysis of datasets with missing values in attributes inevitably results in inaccurate learning models and erroneous results. Facing effectively the challenge of missing values is an essential step of the data mining process. Imputation is often employed to overcome the shortcomings incurred by missing data during the pre-process stage of data analysis. Therefore, a plethora of statistical and machine learning methods have been proposed and employed with a view to imputing the missing values in incomplete data with their potential or actual values. In this context, the main objective of this paper is to put forward an iterative stepwise imputation method based on the semi-supervised learning approach, called IRSSI. Semi-supervised methods have proved to be particularly effective for exploiting incomplete or partially labeled data with regard to the values of the target attribute. The proposed algorithm was experimentally evaluated on real-world benchmark datasets and artificially generated datasets using different high ratios of missing data. The experimental results demonstrate the efficiency of IRSSI algorithm compared to typical imputation methods. |
first_indexed | 2024-12-10T11:10:10Z |
format | Article |
id | doaj.art-ed03ef9dd3df4bbe8f5026564b67e620 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-12-10T11:10:10Z |
publishDate | 2020-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-ed03ef9dd3df4bbe8f5026564b67e6202022-12-22T01:51:27ZengIEEEIEEE Access2169-35362020-01-018905559056910.1109/ACCESS.2020.29940339091515Iterative Robust Semi-Supervised Missing Data ImputationNikos Fazakis0https://orcid.org/0000-0001-7687-2380Georgios Kostopoulos1Sotiris Kotsiantis2https://orcid.org/0000-0002-2247-3082Iosif Mporas3https://orcid.org/0000-0001-6984-0268Department of Electrical and Computer Engineering, University of Patras, Rion, GreeceDepartment of Mathematics, University of Patras, Rion, GreeceDepartment of Mathematics, University of Patras, Rion, GreeceSchool of Engineering and Computer Science, University of Hertfordshire, Hatfield, U.KIn many real-world applications scientists are often confronted with the problem of incomplete datasets due to several reasons. The direct analysis of datasets with missing values in attributes inevitably results in inaccurate learning models and erroneous results. Facing effectively the challenge of missing values is an essential step of the data mining process. Imputation is often employed to overcome the shortcomings incurred by missing data during the pre-process stage of data analysis. Therefore, a plethora of statistical and machine learning methods have been proposed and employed with a view to imputing the missing values in incomplete data with their potential or actual values. In this context, the main objective of this paper is to put forward an iterative stepwise imputation method based on the semi-supervised learning approach, called IRSSI. Semi-supervised methods have proved to be particularly effective for exploiting incomplete or partially labeled data with regard to the values of the target attribute. The proposed algorithm was experimentally evaluated on real-world benchmark datasets and artificially generated datasets using different high ratios of missing data. The experimental results demonstrate the efficiency of IRSSI algorithm compared to typical imputation methods.https://ieeexplore.ieee.org/document/9091515/Missing valuesimputationclassificationsemi-supervised learning |
spellingShingle | Nikos Fazakis Georgios Kostopoulos Sotiris Kotsiantis Iosif Mporas Iterative Robust Semi-Supervised Missing Data Imputation IEEE Access Missing values imputation classification semi-supervised learning |
title | Iterative Robust Semi-Supervised Missing Data Imputation |
title_full | Iterative Robust Semi-Supervised Missing Data Imputation |
title_fullStr | Iterative Robust Semi-Supervised Missing Data Imputation |
title_full_unstemmed | Iterative Robust Semi-Supervised Missing Data Imputation |
title_short | Iterative Robust Semi-Supervised Missing Data Imputation |
title_sort | iterative robust semi supervised missing data imputation |
topic | Missing values imputation classification semi-supervised learning |
url | https://ieeexplore.ieee.org/document/9091515/ |
work_keys_str_mv | AT nikosfazakis iterativerobustsemisupervisedmissingdataimputation AT georgioskostopoulos iterativerobustsemisupervisedmissingdataimputation AT sotiriskotsiantis iterativerobustsemisupervisedmissingdataimputation AT iosifmporas iterativerobustsemisupervisedmissingdataimputation |