An evolutionary computation classification method for high‐dimensional mixed missing variables data
Abstract Data missing is a prevalent issue in various real‐world systems. It may deteriorate the performance of classification algorithms running on these platforms. Numerous effective imputation methods exist to address this problem. However, traditional data imputation approaches mainly focus on l...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2023-12-01
|
Series: | Electronics Letters |
Subjects: | |
Online Access: | https://doi.org/10.1049/ell2.13058 |
_version_ | 1797362151601471488 |
---|---|
author | Mengmeng Li Yi Liu Qibin Zheng Gengsong Li Wei Qin |
author_facet | Mengmeng Li Yi Liu Qibin Zheng Gengsong Li Wei Qin |
author_sort | Mengmeng Li |
collection | DOAJ |
description | Abstract Data missing is a prevalent issue in various real‐world systems. It may deteriorate the performance of classification algorithms running on these platforms. Numerous effective imputation methods exist to address this problem. However, traditional data imputation approaches mainly focus on low‐dimensional missing data, and in addition, they do not make use of the randomness of the missing values and the information of labels simultaneously. To solve these problems, the authors propose a novel data imputation algorithm, named Particle Swarm Optimization for High‐dimensional mixed Missing variables data (PSOHM). PSOHM introduces a feature filtering algorithm for feature selection on missing data, followed by a feature discrimination method to categorize chosen features. PSOHM then employs particle swarm optimization to optimize imputation functions for both continuous and discrete features. Continuous features are modelled as Gaussian distributions, with the mean and standard deviation encoded into particles. Additionally, the probabilities of values for discrete features are also encoded. Moreover, accuracy serves as the optimization objective, utilizing both the randomness of missing values and the label information to improve the algorithm's performance. Six typical algorithms are employed to make a comparison. The results demonstrate that the authors’ method is superior to the compared approaches on the six different kinds of classical datasets. |
first_indexed | 2024-03-08T16:04:16Z |
format | Article |
id | doaj.art-37ff32d549654147913611b1a63f698d |
institution | Directory Open Access Journal |
issn | 0013-5194 1350-911X |
language | English |
last_indexed | 2024-03-08T16:04:16Z |
publishDate | 2023-12-01 |
publisher | Wiley |
record_format | Article |
series | Electronics Letters |
spelling | doaj.art-37ff32d549654147913611b1a63f698d2024-01-08T08:30:54ZengWileyElectronics Letters0013-51941350-911X2023-12-015924n/an/a10.1049/ell2.13058An evolutionary computation classification method for high‐dimensional mixed missing variables dataMengmeng Li0Yi Liu1Qibin Zheng2Gengsong Li3Wei Qin4Academy of Military Sciences Beijing ChinaAcademy of Military Sciences Beijing ChinaAcademy of Military Sciences Beijing ChinaNational Innovation Institute of Defense Technology Beijing ChinaAcademy of Military Sciences Beijing ChinaAbstract Data missing is a prevalent issue in various real‐world systems. It may deteriorate the performance of classification algorithms running on these platforms. Numerous effective imputation methods exist to address this problem. However, traditional data imputation approaches mainly focus on low‐dimensional missing data, and in addition, they do not make use of the randomness of the missing values and the information of labels simultaneously. To solve these problems, the authors propose a novel data imputation algorithm, named Particle Swarm Optimization for High‐dimensional mixed Missing variables data (PSOHM). PSOHM introduces a feature filtering algorithm for feature selection on missing data, followed by a feature discrimination method to categorize chosen features. PSOHM then employs particle swarm optimization to optimize imputation functions for both continuous and discrete features. Continuous features are modelled as Gaussian distributions, with the mean and standard deviation encoded into particles. Additionally, the probabilities of values for discrete features are also encoded. Moreover, accuracy serves as the optimization objective, utilizing both the randomness of missing values and the label information to improve the algorithm's performance. Six typical algorithms are employed to make a comparison. The results demonstrate that the authors’ method is superior to the compared approaches on the six different kinds of classical datasets.https://doi.org/10.1049/ell2.13058evolutionary computationfeature selectionpattern classification |
spellingShingle | Mengmeng Li Yi Liu Qibin Zheng Gengsong Li Wei Qin An evolutionary computation classification method for high‐dimensional mixed missing variables data Electronics Letters evolutionary computation feature selection pattern classification |
title | An evolutionary computation classification method for high‐dimensional mixed missing variables data |
title_full | An evolutionary computation classification method for high‐dimensional mixed missing variables data |
title_fullStr | An evolutionary computation classification method for high‐dimensional mixed missing variables data |
title_full_unstemmed | An evolutionary computation classification method for high‐dimensional mixed missing variables data |
title_short | An evolutionary computation classification method for high‐dimensional mixed missing variables data |
title_sort | evolutionary computation classification method for high dimensional mixed missing variables data |
topic | evolutionary computation feature selection pattern classification |
url | https://doi.org/10.1049/ell2.13058 |
work_keys_str_mv | AT mengmengli anevolutionarycomputationclassificationmethodforhighdimensionalmixedmissingvariablesdata AT yiliu anevolutionarycomputationclassificationmethodforhighdimensionalmixedmissingvariablesdata AT qibinzheng anevolutionarycomputationclassificationmethodforhighdimensionalmixedmissingvariablesdata AT gengsongli anevolutionarycomputationclassificationmethodforhighdimensionalmixedmissingvariablesdata AT weiqin anevolutionarycomputationclassificationmethodforhighdimensionalmixedmissingvariablesdata AT mengmengli evolutionarycomputationclassificationmethodforhighdimensionalmixedmissingvariablesdata AT yiliu evolutionarycomputationclassificationmethodforhighdimensionalmixedmissingvariablesdata AT qibinzheng evolutionarycomputationclassificationmethodforhighdimensionalmixedmissingvariablesdata AT gengsongli evolutionarycomputationclassificationmethodforhighdimensionalmixedmissingvariablesdata AT weiqin evolutionarycomputationclassificationmethodforhighdimensionalmixedmissingvariablesdata |