Iterative Robust Semi-Supervised Missing Data Imputation

In many real-world applications scientists are often confronted with the problem of incomplete datasets due to several reasons. The direct analysis of datasets with missing values in attributes inevitably results in inaccurate learning models and erroneous results. Facing effectively the challenge o...

Full description

Bibliographic Details
Main Authors: Nikos Fazakis, Georgios Kostopoulos, Sotiris Kotsiantis, Iosif Mporas
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9091515/
_version_ 1828406329935396864
author Nikos Fazakis
Georgios Kostopoulos
Sotiris Kotsiantis
Iosif Mporas
author_facet Nikos Fazakis
Georgios Kostopoulos
Sotiris Kotsiantis
Iosif Mporas
author_sort Nikos Fazakis
collection DOAJ
description In many real-world applications scientists are often confronted with the problem of incomplete datasets due to several reasons. The direct analysis of datasets with missing values in attributes inevitably results in inaccurate learning models and erroneous results. Facing effectively the challenge of missing values is an essential step of the data mining process. Imputation is often employed to overcome the shortcomings incurred by missing data during the pre-process stage of data analysis. Therefore, a plethora of statistical and machine learning methods have been proposed and employed with a view to imputing the missing values in incomplete data with their potential or actual values. In this context, the main objective of this paper is to put forward an iterative stepwise imputation method based on the semi-supervised learning approach, called IRSSI. Semi-supervised methods have proved to be particularly effective for exploiting incomplete or partially labeled data with regard to the values of the target attribute. The proposed algorithm was experimentally evaluated on real-world benchmark datasets and artificially generated datasets using different high ratios of missing data. The experimental results demonstrate the efficiency of IRSSI algorithm compared to typical imputation methods.
first_indexed 2024-12-10T11:10:10Z
format Article
id doaj.art-ed03ef9dd3df4bbe8f5026564b67e620
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-10T11:10:10Z
publishDate 2020-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-ed03ef9dd3df4bbe8f5026564b67e6202022-12-22T01:51:27ZengIEEEIEEE Access2169-35362020-01-018905559056910.1109/ACCESS.2020.29940339091515Iterative Robust Semi-Supervised Missing Data ImputationNikos Fazakis0https://orcid.org/0000-0001-7687-2380Georgios Kostopoulos1Sotiris Kotsiantis2https://orcid.org/0000-0002-2247-3082Iosif Mporas3https://orcid.org/0000-0001-6984-0268Department of Electrical and Computer Engineering, University of Patras, Rion, GreeceDepartment of Mathematics, University of Patras, Rion, GreeceDepartment of Mathematics, University of Patras, Rion, GreeceSchool of Engineering and Computer Science, University of Hertfordshire, Hatfield, U.KIn many real-world applications scientists are often confronted with the problem of incomplete datasets due to several reasons. The direct analysis of datasets with missing values in attributes inevitably results in inaccurate learning models and erroneous results. Facing effectively the challenge of missing values is an essential step of the data mining process. Imputation is often employed to overcome the shortcomings incurred by missing data during the pre-process stage of data analysis. Therefore, a plethora of statistical and machine learning methods have been proposed and employed with a view to imputing the missing values in incomplete data with their potential or actual values. In this context, the main objective of this paper is to put forward an iterative stepwise imputation method based on the semi-supervised learning approach, called IRSSI. Semi-supervised methods have proved to be particularly effective for exploiting incomplete or partially labeled data with regard to the values of the target attribute. The proposed algorithm was experimentally evaluated on real-world benchmark datasets and artificially generated datasets using different high ratios of missing data. The experimental results demonstrate the efficiency of IRSSI algorithm compared to typical imputation methods.https://ieeexplore.ieee.org/document/9091515/Missing valuesimputationclassificationsemi-supervised learning
spellingShingle Nikos Fazakis
Georgios Kostopoulos
Sotiris Kotsiantis
Iosif Mporas
Iterative Robust Semi-Supervised Missing Data Imputation
IEEE Access
Missing values
imputation
classification
semi-supervised learning
title Iterative Robust Semi-Supervised Missing Data Imputation
title_full Iterative Robust Semi-Supervised Missing Data Imputation
title_fullStr Iterative Robust Semi-Supervised Missing Data Imputation
title_full_unstemmed Iterative Robust Semi-Supervised Missing Data Imputation
title_short Iterative Robust Semi-Supervised Missing Data Imputation
title_sort iterative robust semi supervised missing data imputation
topic Missing values
imputation
classification
semi-supervised learning
url https://ieeexplore.ieee.org/document/9091515/
work_keys_str_mv AT nikosfazakis iterativerobustsemisupervisedmissingdataimputation
AT georgioskostopoulos iterativerobustsemisupervisedmissingdataimputation
AT sotiriskotsiantis iterativerobustsemisupervisedmissingdataimputation
AT iosifmporas iterativerobustsemisupervisedmissingdataimputation