Missing Data Analysis in Regression

Many of the datasets in real-world applications contain incompleteness. In this paper, we approach the effects and possible solutions to incomplete databases in regression, aiming to bridge a gap between theoretically effective algorithms. We investigated the actual effects of missing data for regre...

Full description

Bibliographic Details
Main Authors: C. G. Marcelino, G. M. C. Leite, P. Celes, C. E. Pedreira
Format: Article
Language:English
Published: Taylor & Francis Group 2022-12-01
Series:Applied Artificial Intelligence
Online Access:http://dx.doi.org/10.1080/08839514.2022.2032925
_version_ 1797641065168109568
author C. G. Marcelino
G. M. C. Leite
P. Celes
C. E. Pedreira
author_facet C. G. Marcelino
G. M. C. Leite
P. Celes
C. E. Pedreira
author_sort C. G. Marcelino
collection DOAJ
description Many of the datasets in real-world applications contain incompleteness. In this paper, we approach the effects and possible solutions to incomplete databases in regression, aiming to bridge a gap between theoretically effective algorithms. We investigated the actual effects of missing data for regression by analyzing its impact in several publicly available databases implementing popular algorithms like Decision Tree, Random Forests, Adaboost, K-Nearest Neighbors, Support Vector Machines, and Neural Networks. Our goal is to offer a systematic view of how missing data may affect regression results. After exhaustive simulation analyzing eight public datasets from UCI and KEEL (Abalone, Arfoil, Bike, California, Compactiv, Mortage, Wankara and Wine), we concluded that the effect of missing data may be significant. The results obtained showed that K-Nearest Neighbors works better than others in the regression of data that has missing data.
first_indexed 2024-03-11T13:40:09Z
format Article
id doaj.art-3922d87b802e4e8fa1d111cde318346a
institution Directory Open Access Journal
issn 0883-9514
1087-6545
language English
last_indexed 2024-03-11T13:40:09Z
publishDate 2022-12-01
publisher Taylor & Francis Group
record_format Article
series Applied Artificial Intelligence
spelling doaj.art-3922d87b802e4e8fa1d111cde318346a2023-11-02T13:36:38ZengTaylor & Francis GroupApplied Artificial Intelligence0883-95141087-65452022-12-0136110.1080/08839514.2022.20329252032925Missing Data Analysis in RegressionC. G. Marcelino0G. M. C. Leite1P. Celes2C. E. Pedreira3Federal University of Rio de Janeiro (UFRJ)Federal University of Rio de Janeiro (UFRJ)Federal University of Rio de Janeiro (UFRJ)Federal University of Rio de Janeiro (UFRJ)Many of the datasets in real-world applications contain incompleteness. In this paper, we approach the effects and possible solutions to incomplete databases in regression, aiming to bridge a gap between theoretically effective algorithms. We investigated the actual effects of missing data for regression by analyzing its impact in several publicly available databases implementing popular algorithms like Decision Tree, Random Forests, Adaboost, K-Nearest Neighbors, Support Vector Machines, and Neural Networks. Our goal is to offer a systematic view of how missing data may affect regression results. After exhaustive simulation analyzing eight public datasets from UCI and KEEL (Abalone, Arfoil, Bike, California, Compactiv, Mortage, Wankara and Wine), we concluded that the effect of missing data may be significant. The results obtained showed that K-Nearest Neighbors works better than others in the regression of data that has missing data.http://dx.doi.org/10.1080/08839514.2022.2032925
spellingShingle C. G. Marcelino
G. M. C. Leite
P. Celes
C. E. Pedreira
Missing Data Analysis in Regression
Applied Artificial Intelligence
title Missing Data Analysis in Regression
title_full Missing Data Analysis in Regression
title_fullStr Missing Data Analysis in Regression
title_full_unstemmed Missing Data Analysis in Regression
title_short Missing Data Analysis in Regression
title_sort missing data analysis in regression
url http://dx.doi.org/10.1080/08839514.2022.2032925
work_keys_str_mv AT cgmarcelino missingdataanalysisinregression
AT gmcleite missingdataanalysisinregression
AT pceles missingdataanalysisinregression
AT cepedreira missingdataanalysisinregression