Missing Data Imputation for Supervised Learning

Missing data imputation can help improve the performance of prediction models in situations where missing data hide useful information. This paper compares methods for imputing missing categorical data for supervised classification tasks. We experiment on two machine learning benchmark datasets with...

Full description

Bibliographic Details
Main Authors:	Jason Poulos, Rafael Valle
Format:	Article
Language:	English
Published:	Taylor & Francis Group 2018-04-01
Series:	Applied Artificial Intelligence
Online Access:	http://dx.doi.org/10.1080/08839514.2018.1448143

_version_	1827817650977243136
author	Jason Poulos Rafael Valle
author_facet	Jason Poulos Rafael Valle
author_sort	Jason Poulos
collection	DOAJ
description	Missing data imputation can help improve the performance of prediction models in situations where missing data hide useful information. This paper compares methods for imputing missing categorical data for supervised classification tasks. We experiment on two machine learning benchmark datasets with missing categorical data, comparing classifiers trained on non-imputed (i.e., one-hot encoded) or imputed data with different levels of additional missing-data perturbation. We show imputation methods can increase predictive accuracy in the presence of missing-data perturbation, which can actually improve prediction accuracy by regularizing the classifier. We achieve results comparable to the state-of-the-art on the Adult dataset with missing-data perturbation and $$k$$-nearest-neighbors ($$k$$-NN) imputation.
first_indexed	2024-03-12T00:37:15Z
format	Article
id	doaj.art-84be66ac51d443ac9c475ee20c821bec
institution	Directory Open Access Journal
issn	0883-9514 1087-6545
language	English
last_indexed	2024-03-12T00:37:15Z
publishDate	2018-04-01
publisher	Taylor & Francis Group
record_format	Article
series	Applied Artificial Intelligence
spelling	doaj.art-84be66ac51d443ac9c475ee20c821bec2023-09-15T09:33:56ZengTaylor & Francis GroupApplied Artificial Intelligence0883-95141087-65452018-04-0132218619610.1080/08839514.2018.14481431448143Missing Data Imputation for Supervised LearningJason Poulos0Rafael Valle1Departments of Political Science and Electrical Engineering and Computer Sciences, University of CaliforniaDepartments of Political Science and Electrical Engineering and Computer Sciences, University of CaliforniaMissing data imputation can help improve the performance of prediction models in situations where missing data hide useful information. This paper compares methods for imputing missing categorical data for supervised classification tasks. We experiment on two machine learning benchmark datasets with missing categorical data, comparing classifiers trained on non-imputed (i.e., one-hot encoded) or imputed data with different levels of additional missing-data perturbation. We show imputation methods can increase predictive accuracy in the presence of missing-data perturbation, which can actually improve prediction accuracy by regularizing the classifier. We achieve results comparable to the state-of-the-art on the Adult dataset with missing-data perturbation and $$k$$-nearest-neighbors ($$k$$-NN) imputation.http://dx.doi.org/10.1080/08839514.2018.1448143
spellingShingle	Jason Poulos Rafael Valle Missing Data Imputation for Supervised Learning Applied Artificial Intelligence
title	Missing Data Imputation for Supervised Learning
title_full	Missing Data Imputation for Supervised Learning
title_fullStr	Missing Data Imputation for Supervised Learning
title_full_unstemmed	Missing Data Imputation for Supervised Learning
title_short	Missing Data Imputation for Supervised Learning
title_sort	missing data imputation for supervised learning
url	http://dx.doi.org/10.1080/08839514.2018.1448143
work_keys_str_mv	AT jasonpoulos missingdataimputationforsupervisedlearning AT rafaelvalle missingdataimputationforsupervisedlearning

Missing Data Imputation for Supervised Learning

Similar Items