An Integrated Novel Framework for Coping Missing Values Imputation and Classification
This work presents an integrated framework for imputation of missing values and prediction of class label of unseen samples by using the best features of rule based inductive decision tree (DT) and Support Vector Machine (SVM) classifier (DT-SVM). In this work, the decision tree is used for imputati...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2022-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9810963/ |
_version_ | 1828150114195079168 |
---|---|
author | Monalisa Jena Satchidananda Dehuri |
author_facet | Monalisa Jena Satchidananda Dehuri |
author_sort | Monalisa Jena |
collection | DOAJ |
description | This work presents an integrated framework for imputation of missing values and prediction of class label of unseen samples by using the best features of rule based inductive decision tree (DT) and Support Vector Machine (SVM) classifier (DT-SVM). In this work, the decision tree is used for imputation of missing values of the datasets containing both categorical and numerical valued attributes. In addition, some of the other popular and simple missing value imputation techniques like drop, mean, median, mode, and k-nearest neighbor (kNN) are used for a comparative analysis. The imputed datasets are then classified using SVM. The performance of the proposed integrated novel framework DT-SVM has been compared with Drop-SVM, Mean-SVM, Median-SVM, Mode-SVM, and kNN-SVM and it is found that DT-SVM outperforms others. Further, a new variant of kNN named it as approximated kNN (A-kNN) has been proposed to overcome some of the shortcomings of canonical kNN while learning from a training set imputed by DT. Unlike canonical kNN, A-kNN does not scan the entire training set. Instead, it processes some of the representative instances from the training dataset to identify the nearest neighbor. The class centroid approach is adopted to find the representative instances of the training set. The effectiveness in term of accuracy as well as computational time of A-kNN is examined by comparing with canonical kNN. It is found that computational time of the proposed A-kNN is drastically reduced as compared to canonical kNN without compromising with the classification accuracy. |
first_indexed | 2024-04-11T21:38:49Z |
format | Article |
id | doaj.art-ceee8eb5d03443238e4f50adfa4d9d8e |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-04-11T21:38:49Z |
publishDate | 2022-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-ceee8eb5d03443238e4f50adfa4d9d8e2022-12-22T04:01:39ZengIEEEIEEE Access2169-35362022-01-0110693736938710.1109/ACCESS.2022.31874129810963An Integrated Novel Framework for Coping Missing Values Imputation and ClassificationMonalisa Jena0https://orcid.org/0000-0002-6687-075XSatchidananda Dehuri1Department of Computer Science, Fakir Mohan University, Balasore, Odisha, IndiaDepartment of Computer Science, Fakir Mohan University, Balasore, Odisha, IndiaThis work presents an integrated framework for imputation of missing values and prediction of class label of unseen samples by using the best features of rule based inductive decision tree (DT) and Support Vector Machine (SVM) classifier (DT-SVM). In this work, the decision tree is used for imputation of missing values of the datasets containing both categorical and numerical valued attributes. In addition, some of the other popular and simple missing value imputation techniques like drop, mean, median, mode, and k-nearest neighbor (kNN) are used for a comparative analysis. The imputed datasets are then classified using SVM. The performance of the proposed integrated novel framework DT-SVM has been compared with Drop-SVM, Mean-SVM, Median-SVM, Mode-SVM, and kNN-SVM and it is found that DT-SVM outperforms others. Further, a new variant of kNN named it as approximated kNN (A-kNN) has been proposed to overcome some of the shortcomings of canonical kNN while learning from a training set imputed by DT. Unlike canonical kNN, A-kNN does not scan the entire training set. Instead, it processes some of the representative instances from the training dataset to identify the nearest neighbor. The class centroid approach is adopted to find the representative instances of the training set. The effectiveness in term of accuracy as well as computational time of A-kNN is examined by comparing with canonical kNN. It is found that computational time of the proposed A-kNN is drastically reduced as compared to canonical kNN without compromising with the classification accuracy.https://ieeexplore.ieee.org/document/9810963/Classificationdata miningdecision treekNN classifiermissing values imputationSVM |
spellingShingle | Monalisa Jena Satchidananda Dehuri An Integrated Novel Framework for Coping Missing Values Imputation and Classification IEEE Access Classification data mining decision tree kNN classifier missing values imputation SVM |
title | An Integrated Novel Framework for Coping Missing Values Imputation and Classification |
title_full | An Integrated Novel Framework for Coping Missing Values Imputation and Classification |
title_fullStr | An Integrated Novel Framework for Coping Missing Values Imputation and Classification |
title_full_unstemmed | An Integrated Novel Framework for Coping Missing Values Imputation and Classification |
title_short | An Integrated Novel Framework for Coping Missing Values Imputation and Classification |
title_sort | integrated novel framework for coping missing values imputation and classification |
topic | Classification data mining decision tree kNN classifier missing values imputation SVM |
url | https://ieeexplore.ieee.org/document/9810963/ |
work_keys_str_mv | AT monalisajena anintegratednovelframeworkforcopingmissingvaluesimputationandclassification AT satchidanandadehuri anintegratednovelframeworkforcopingmissingvaluesimputationandclassification AT monalisajena integratednovelframeworkforcopingmissingvaluesimputationandclassification AT satchidanandadehuri integratednovelframeworkforcopingmissingvaluesimputationandclassification |