An evolutionary computation classification method for high‐dimensional mixed missing variables data

Abstract Data missing is a prevalent issue in various real‐world systems. It may deteriorate the performance of classification algorithms running on these platforms. Numerous effective imputation methods exist to address this problem. However, traditional data imputation approaches mainly focus on l...

Full description

Bibliographic Details
Main Authors: Mengmeng Li, Yi Liu, Qibin Zheng, Gengsong Li, Wei Qin
Format: Article
Language:English
Published: Wiley 2023-12-01
Series:Electronics Letters
Subjects:
Online Access:https://doi.org/10.1049/ell2.13058
_version_ 1797362151601471488
author Mengmeng Li
Yi Liu
Qibin Zheng
Gengsong Li
Wei Qin
author_facet Mengmeng Li
Yi Liu
Qibin Zheng
Gengsong Li
Wei Qin
author_sort Mengmeng Li
collection DOAJ
description Abstract Data missing is a prevalent issue in various real‐world systems. It may deteriorate the performance of classification algorithms running on these platforms. Numerous effective imputation methods exist to address this problem. However, traditional data imputation approaches mainly focus on low‐dimensional missing data, and in addition, they do not make use of the randomness of the missing values and the information of labels simultaneously. To solve these problems, the authors propose a novel data imputation algorithm, named Particle Swarm Optimization for High‐dimensional mixed Missing variables data (PSOHM). PSOHM introduces a feature filtering algorithm for feature selection on missing data, followed by a feature discrimination method to categorize chosen features. PSOHM then employs particle swarm optimization to optimize imputation functions for both continuous and discrete features. Continuous features are modelled as Gaussian distributions, with the mean and standard deviation encoded into particles. Additionally, the probabilities of values for discrete features are also encoded. Moreover, accuracy serves as the optimization objective, utilizing both the randomness of missing values and the label information to improve the algorithm's performance. Six typical algorithms are employed to make a comparison. The results demonstrate that the authors’ method is superior to the compared approaches on the six different kinds of classical datasets.
first_indexed 2024-03-08T16:04:16Z
format Article
id doaj.art-37ff32d549654147913611b1a63f698d
institution Directory Open Access Journal
issn 0013-5194
1350-911X
language English
last_indexed 2024-03-08T16:04:16Z
publishDate 2023-12-01
publisher Wiley
record_format Article
series Electronics Letters
spelling doaj.art-37ff32d549654147913611b1a63f698d2024-01-08T08:30:54ZengWileyElectronics Letters0013-51941350-911X2023-12-015924n/an/a10.1049/ell2.13058An evolutionary computation classification method for high‐dimensional mixed missing variables dataMengmeng Li0Yi Liu1Qibin Zheng2Gengsong Li3Wei Qin4Academy of Military Sciences Beijing ChinaAcademy of Military Sciences Beijing ChinaAcademy of Military Sciences Beijing ChinaNational Innovation Institute of Defense Technology Beijing ChinaAcademy of Military Sciences Beijing ChinaAbstract Data missing is a prevalent issue in various real‐world systems. It may deteriorate the performance of classification algorithms running on these platforms. Numerous effective imputation methods exist to address this problem. However, traditional data imputation approaches mainly focus on low‐dimensional missing data, and in addition, they do not make use of the randomness of the missing values and the information of labels simultaneously. To solve these problems, the authors propose a novel data imputation algorithm, named Particle Swarm Optimization for High‐dimensional mixed Missing variables data (PSOHM). PSOHM introduces a feature filtering algorithm for feature selection on missing data, followed by a feature discrimination method to categorize chosen features. PSOHM then employs particle swarm optimization to optimize imputation functions for both continuous and discrete features. Continuous features are modelled as Gaussian distributions, with the mean and standard deviation encoded into particles. Additionally, the probabilities of values for discrete features are also encoded. Moreover, accuracy serves as the optimization objective, utilizing both the randomness of missing values and the label information to improve the algorithm's performance. Six typical algorithms are employed to make a comparison. The results demonstrate that the authors’ method is superior to the compared approaches on the six different kinds of classical datasets.https://doi.org/10.1049/ell2.13058evolutionary computationfeature selectionpattern classification
spellingShingle Mengmeng Li
Yi Liu
Qibin Zheng
Gengsong Li
Wei Qin
An evolutionary computation classification method for high‐dimensional mixed missing variables data
Electronics Letters
evolutionary computation
feature selection
pattern classification
title An evolutionary computation classification method for high‐dimensional mixed missing variables data
title_full An evolutionary computation classification method for high‐dimensional mixed missing variables data
title_fullStr An evolutionary computation classification method for high‐dimensional mixed missing variables data
title_full_unstemmed An evolutionary computation classification method for high‐dimensional mixed missing variables data
title_short An evolutionary computation classification method for high‐dimensional mixed missing variables data
title_sort evolutionary computation classification method for high dimensional mixed missing variables data
topic evolutionary computation
feature selection
pattern classification
url https://doi.org/10.1049/ell2.13058
work_keys_str_mv AT mengmengli anevolutionarycomputationclassificationmethodforhighdimensionalmixedmissingvariablesdata
AT yiliu anevolutionarycomputationclassificationmethodforhighdimensionalmixedmissingvariablesdata
AT qibinzheng anevolutionarycomputationclassificationmethodforhighdimensionalmixedmissingvariablesdata
AT gengsongli anevolutionarycomputationclassificationmethodforhighdimensionalmixedmissingvariablesdata
AT weiqin anevolutionarycomputationclassificationmethodforhighdimensionalmixedmissingvariablesdata
AT mengmengli evolutionarycomputationclassificationmethodforhighdimensionalmixedmissingvariablesdata
AT yiliu evolutionarycomputationclassificationmethodforhighdimensionalmixedmissingvariablesdata
AT qibinzheng evolutionarycomputationclassificationmethodforhighdimensionalmixedmissingvariablesdata
AT gengsongli evolutionarycomputationclassificationmethodforhighdimensionalmixedmissingvariablesdata
AT weiqin evolutionarycomputationclassificationmethodforhighdimensionalmixedmissingvariablesdata