An Improved Multilabel k-Nearest Neighbor Algorithm Based on Value and Weight

Multilabel data share important features, including label imbalance, which has a significant influence on the performance of classifiers. Because of this problem, a widely used multilabel classification algorithm, the multilabel k-nearest neighbor (ML-kNN) algorithm, has poor performance on imbalanc...

Full description

Bibliographic Details
Main Authors:	Zhe Wang, Hao Xu, Pan Zhou, Gang Xiao
Format:	Article
Language:	English
Published:	MDPI AG 2023-02-01
Series:	Computation
Subjects:	label imbalance multilabel classification ML-kNN
Online Access:	https://www.mdpi.com/2079-3197/11/2/32

_version_	1827758028524355584
author	Zhe Wang Hao Xu Pan Zhou Gang Xiao
author_facet	Zhe Wang Hao Xu Pan Zhou Gang Xiao
author_sort	Zhe Wang
collection	DOAJ
description	Multilabel data share important features, including label imbalance, which has a significant influence on the performance of classifiers. Because of this problem, a widely used multilabel classification algorithm, the multilabel k-nearest neighbor (ML-kNN) algorithm, has poor performance on imbalanced multilabel data. To address this problem, this study proposes an improved ML-kNN algorithm based on value and weight. In this improved algorithm, labels are divided into minority and majority, and different strategies are adopted for different labels. By considering the label of latent information carried by the nearest neighbors, a value calculation method is proposed and used to directly classify majority labels. Additionally, to address the misclassification problem caused by a lack of nearest neighbor information for minority labels, weight calculation is proposed. The proposed weight calculation converts distance information with and without label sets in the nearest neighbors into weights. The experimental results on multilabel datasets from different benchmarks demonstrate the performance of the algorithm, especially for datasets with high imbalance. Different evaluation metrics show that the results are improved by approximately 2–10%. The verified algorithm could be applied to a multilabel classification of various fields involving label imbalance, such as drug molecule identification, building identification, and text categorization.
first_indexed	2024-03-11T08:58:48Z
format	Article
id	doaj.art-46d99ae025bd41dd8ba605d67fcab04b
institution	Directory Open Access Journal
issn	2079-3197
language	English
last_indexed	2024-03-11T08:58:48Z
publishDate	2023-02-01
publisher	MDPI AG
record_format	Article
series	Computation
spelling	doaj.art-46d99ae025bd41dd8ba605d67fcab04b2023-11-16T19:52:56ZengMDPI AGComputation2079-31972023-02-011123210.3390/computation11020032An Improved Multilabel k-Nearest Neighbor Algorithm Based on Value and WeightZhe Wang0Hao Xu1Pan Zhou2Gang Xiao3College of Information Engineering, Zhejiang University of Technology, Hangzhou 323000, ChinaCollege of Engineering, Lishui University, Lishui 323000, ChinaCollege of Engineering, Lishui University, Lishui 323000, ChinaCollege of Information Engineering, Zhejiang University of Technology, Hangzhou 323000, ChinaMultilabel data share important features, including label imbalance, which has a significant influence on the performance of classifiers. Because of this problem, a widely used multilabel classification algorithm, the multilabel k-nearest neighbor (ML-kNN) algorithm, has poor performance on imbalanced multilabel data. To address this problem, this study proposes an improved ML-kNN algorithm based on value and weight. In this improved algorithm, labels are divided into minority and majority, and different strategies are adopted for different labels. By considering the label of latent information carried by the nearest neighbors, a value calculation method is proposed and used to directly classify majority labels. Additionally, to address the misclassification problem caused by a lack of nearest neighbor information for minority labels, weight calculation is proposed. The proposed weight calculation converts distance information with and without label sets in the nearest neighbors into weights. The experimental results on multilabel datasets from different benchmarks demonstrate the performance of the algorithm, especially for datasets with high imbalance. Different evaluation metrics show that the results are improved by approximately 2–10%. The verified algorithm could be applied to a multilabel classification of various fields involving label imbalance, such as drug molecule identification, building identification, and text categorization.https://www.mdpi.com/2079-3197/11/2/32label imbalancemultilabel classificationML-kNN
spellingShingle	Zhe Wang Hao Xu Pan Zhou Gang Xiao An Improved Multilabel k-Nearest Neighbor Algorithm Based on Value and Weight Computation label imbalance multilabel classification ML-kNN
title	An Improved Multilabel k-Nearest Neighbor Algorithm Based on Value and Weight
title_full	An Improved Multilabel k-Nearest Neighbor Algorithm Based on Value and Weight
title_fullStr	An Improved Multilabel k-Nearest Neighbor Algorithm Based on Value and Weight
title_full_unstemmed	An Improved Multilabel k-Nearest Neighbor Algorithm Based on Value and Weight
title_short	An Improved Multilabel k-Nearest Neighbor Algorithm Based on Value and Weight
title_sort	improved multilabel k nearest neighbor algorithm based on value and weight
topic	label imbalance multilabel classification ML-kNN
url	https://www.mdpi.com/2079-3197/11/2/32
work_keys_str_mv	AT zhewang animprovedmultilabelknearestneighboralgorithmbasedonvalueandweight AT haoxu animprovedmultilabelknearestneighboralgorithmbasedonvalueandweight AT panzhou animprovedmultilabelknearestneighboralgorithmbasedonvalueandweight AT gangxiao animprovedmultilabelknearestneighboralgorithmbasedonvalueandweight AT zhewang improvedmultilabelknearestneighboralgorithmbasedonvalueandweight AT haoxu improvedmultilabelknearestneighboralgorithmbasedonvalueandweight AT panzhou improvedmultilabelknearestneighboralgorithmbasedonvalueandweight AT gangxiao improvedmultilabelknearestneighboralgorithmbasedonvalueandweight

An Improved Multilabel k-Nearest Neighbor Algorithm Based on Value and Weight

Similar Items