Missing Data Analysis in Regression
Many of the datasets in real-world applications contain incompleteness. In this paper, we approach the effects and possible solutions to incomplete databases in regression, aiming to bridge a gap between theoretically effective algorithms. We investigated the actual effects of missing data for regre...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Taylor & Francis Group
2022-12-01
|
Series: | Applied Artificial Intelligence |
Online Access: | http://dx.doi.org/10.1080/08839514.2022.2032925 |
_version_ | 1797641065168109568 |
---|---|
author | C. G. Marcelino G. M. C. Leite P. Celes C. E. Pedreira |
author_facet | C. G. Marcelino G. M. C. Leite P. Celes C. E. Pedreira |
author_sort | C. G. Marcelino |
collection | DOAJ |
description | Many of the datasets in real-world applications contain incompleteness. In this paper, we approach the effects and possible solutions to incomplete databases in regression, aiming to bridge a gap between theoretically effective algorithms. We investigated the actual effects of missing data for regression by analyzing its impact in several publicly available databases implementing popular algorithms like Decision Tree, Random Forests, Adaboost, K-Nearest Neighbors, Support Vector Machines, and Neural Networks. Our goal is to offer a systematic view of how missing data may affect regression results. After exhaustive simulation analyzing eight public datasets from UCI and KEEL (Abalone, Arfoil, Bike, California, Compactiv, Mortage, Wankara and Wine), we concluded that the effect of missing data may be significant. The results obtained showed that K-Nearest Neighbors works better than others in the regression of data that has missing data. |
first_indexed | 2024-03-11T13:40:09Z |
format | Article |
id | doaj.art-3922d87b802e4e8fa1d111cde318346a |
institution | Directory Open Access Journal |
issn | 0883-9514 1087-6545 |
language | English |
last_indexed | 2024-03-11T13:40:09Z |
publishDate | 2022-12-01 |
publisher | Taylor & Francis Group |
record_format | Article |
series | Applied Artificial Intelligence |
spelling | doaj.art-3922d87b802e4e8fa1d111cde318346a2023-11-02T13:36:38ZengTaylor & Francis GroupApplied Artificial Intelligence0883-95141087-65452022-12-0136110.1080/08839514.2022.20329252032925Missing Data Analysis in RegressionC. G. Marcelino0G. M. C. Leite1P. Celes2C. E. Pedreira3Federal University of Rio de Janeiro (UFRJ)Federal University of Rio de Janeiro (UFRJ)Federal University of Rio de Janeiro (UFRJ)Federal University of Rio de Janeiro (UFRJ)Many of the datasets in real-world applications contain incompleteness. In this paper, we approach the effects and possible solutions to incomplete databases in regression, aiming to bridge a gap between theoretically effective algorithms. We investigated the actual effects of missing data for regression by analyzing its impact in several publicly available databases implementing popular algorithms like Decision Tree, Random Forests, Adaboost, K-Nearest Neighbors, Support Vector Machines, and Neural Networks. Our goal is to offer a systematic view of how missing data may affect regression results. After exhaustive simulation analyzing eight public datasets from UCI and KEEL (Abalone, Arfoil, Bike, California, Compactiv, Mortage, Wankara and Wine), we concluded that the effect of missing data may be significant. The results obtained showed that K-Nearest Neighbors works better than others in the regression of data that has missing data.http://dx.doi.org/10.1080/08839514.2022.2032925 |
spellingShingle | C. G. Marcelino G. M. C. Leite P. Celes C. E. Pedreira Missing Data Analysis in Regression Applied Artificial Intelligence |
title | Missing Data Analysis in Regression |
title_full | Missing Data Analysis in Regression |
title_fullStr | Missing Data Analysis in Regression |
title_full_unstemmed | Missing Data Analysis in Regression |
title_short | Missing Data Analysis in Regression |
title_sort | missing data analysis in regression |
url | http://dx.doi.org/10.1080/08839514.2022.2032925 |
work_keys_str_mv | AT cgmarcelino missingdataanalysisinregression AT gmcleite missingdataanalysisinregression AT pceles missingdataanalysisinregression AT cepedreira missingdataanalysisinregression |