An Integrated Novel Framework for Coping Missing Values Imputation and Classification

This work presents an integrated framework for imputation of missing values and prediction of class label of unseen samples by using the best features of rule based inductive decision tree (DT) and Support Vector Machine (SVM) classifier (DT-SVM). In this work, the decision tree is used for imputati...

Full description

Bibliographic Details
Main Authors: Monalisa Jena, Satchidananda Dehuri
Format: Article
Language:English
Published: IEEE 2022-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9810963/
_version_ 1828150114195079168
author Monalisa Jena
Satchidananda Dehuri
author_facet Monalisa Jena
Satchidananda Dehuri
author_sort Monalisa Jena
collection DOAJ
description This work presents an integrated framework for imputation of missing values and prediction of class label of unseen samples by using the best features of rule based inductive decision tree (DT) and Support Vector Machine (SVM) classifier (DT-SVM). In this work, the decision tree is used for imputation of missing values of the datasets containing both categorical and numerical valued attributes. In addition, some of the other popular and simple missing value imputation techniques like drop, mean, median, mode, and k-nearest neighbor (kNN) are used for a comparative analysis. The imputed datasets are then classified using SVM. The performance of the proposed integrated novel framework DT-SVM has been compared with Drop-SVM, Mean-SVM, Median-SVM, Mode-SVM, and kNN-SVM and it is found that DT-SVM outperforms others. Further, a new variant of kNN named it as approximated kNN (A-kNN) has been proposed to overcome some of the shortcomings of canonical kNN while learning from a training set imputed by DT. Unlike canonical kNN, A-kNN does not scan the entire training set. Instead, it processes some of the representative instances from the training dataset to identify the nearest neighbor. The class centroid approach is adopted to find the representative instances of the training set. The effectiveness in term of accuracy as well as computational time of A-kNN is examined by comparing with canonical kNN. It is found that computational time of the proposed A-kNN is drastically reduced as compared to canonical kNN without compromising with the classification accuracy.
first_indexed 2024-04-11T21:38:49Z
format Article
id doaj.art-ceee8eb5d03443238e4f50adfa4d9d8e
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-04-11T21:38:49Z
publishDate 2022-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-ceee8eb5d03443238e4f50adfa4d9d8e2022-12-22T04:01:39ZengIEEEIEEE Access2169-35362022-01-0110693736938710.1109/ACCESS.2022.31874129810963An Integrated Novel Framework for Coping Missing Values Imputation and ClassificationMonalisa Jena0https://orcid.org/0000-0002-6687-075XSatchidananda Dehuri1Department of Computer Science, Fakir Mohan University, Balasore, Odisha, IndiaDepartment of Computer Science, Fakir Mohan University, Balasore, Odisha, IndiaThis work presents an integrated framework for imputation of missing values and prediction of class label of unseen samples by using the best features of rule based inductive decision tree (DT) and Support Vector Machine (SVM) classifier (DT-SVM). In this work, the decision tree is used for imputation of missing values of the datasets containing both categorical and numerical valued attributes. In addition, some of the other popular and simple missing value imputation techniques like drop, mean, median, mode, and k-nearest neighbor (kNN) are used for a comparative analysis. The imputed datasets are then classified using SVM. The performance of the proposed integrated novel framework DT-SVM has been compared with Drop-SVM, Mean-SVM, Median-SVM, Mode-SVM, and kNN-SVM and it is found that DT-SVM outperforms others. Further, a new variant of kNN named it as approximated kNN (A-kNN) has been proposed to overcome some of the shortcomings of canonical kNN while learning from a training set imputed by DT. Unlike canonical kNN, A-kNN does not scan the entire training set. Instead, it processes some of the representative instances from the training dataset to identify the nearest neighbor. The class centroid approach is adopted to find the representative instances of the training set. The effectiveness in term of accuracy as well as computational time of A-kNN is examined by comparing with canonical kNN. It is found that computational time of the proposed A-kNN is drastically reduced as compared to canonical kNN without compromising with the classification accuracy.https://ieeexplore.ieee.org/document/9810963/Classificationdata miningdecision treekNN classifiermissing values imputationSVM
spellingShingle Monalisa Jena
Satchidananda Dehuri
An Integrated Novel Framework for Coping Missing Values Imputation and Classification
IEEE Access
Classification
data mining
decision tree
kNN classifier
missing values imputation
SVM
title An Integrated Novel Framework for Coping Missing Values Imputation and Classification
title_full An Integrated Novel Framework for Coping Missing Values Imputation and Classification
title_fullStr An Integrated Novel Framework for Coping Missing Values Imputation and Classification
title_full_unstemmed An Integrated Novel Framework for Coping Missing Values Imputation and Classification
title_short An Integrated Novel Framework for Coping Missing Values Imputation and Classification
title_sort integrated novel framework for coping missing values imputation and classification
topic Classification
data mining
decision tree
kNN classifier
missing values imputation
SVM
url https://ieeexplore.ieee.org/document/9810963/
work_keys_str_mv AT monalisajena anintegratednovelframeworkforcopingmissingvaluesimputationandclassification
AT satchidanandadehuri anintegratednovelframeworkforcopingmissingvaluesimputationandclassification
AT monalisajena integratednovelframeworkforcopingmissingvaluesimputationandclassification
AT satchidanandadehuri integratednovelframeworkforcopingmissingvaluesimputationandclassification