An Improved Multilabel k-Nearest Neighbor Algorithm Based on Value and Weight

Multilabel data share important features, including label imbalance, which has a significant influence on the performance of classifiers. Because of this problem, a widely used multilabel classification algorithm, the multilabel k-nearest neighbor (ML-kNN) algorithm, has poor performance on imbalanc...

Full description

Bibliographic Details
Main Authors: Zhe Wang, Hao Xu, Pan Zhou, Gang Xiao
Format: Article
Language:English
Published: MDPI AG 2023-02-01
Series:Computation
Subjects:
Online Access:https://www.mdpi.com/2079-3197/11/2/32
_version_ 1797621635366256640
author Zhe Wang
Hao Xu
Pan Zhou
Gang Xiao
author_facet Zhe Wang
Hao Xu
Pan Zhou
Gang Xiao
author_sort Zhe Wang
collection DOAJ
description Multilabel data share important features, including label imbalance, which has a significant influence on the performance of classifiers. Because of this problem, a widely used multilabel classification algorithm, the multilabel k-nearest neighbor (ML-kNN) algorithm, has poor performance on imbalanced multilabel data. To address this problem, this study proposes an improved ML-kNN algorithm based on value and weight. In this improved algorithm, labels are divided into minority and majority, and different strategies are adopted for different labels. By considering the label of latent information carried by the nearest neighbors, a value calculation method is proposed and used to directly classify majority labels. Additionally, to address the misclassification problem caused by a lack of nearest neighbor information for minority labels, weight calculation is proposed. The proposed weight calculation converts distance information with and without label sets in the nearest neighbors into weights. The experimental results on multilabel datasets from different benchmarks demonstrate the performance of the algorithm, especially for datasets with high imbalance. Different evaluation metrics show that the results are improved by approximately 2–10%. The verified algorithm could be applied to a multilabel classification of various fields involving label imbalance, such as drug molecule identification, building identification, and text categorization.
first_indexed 2024-03-11T08:58:48Z
format Article
id doaj.art-46d99ae025bd41dd8ba605d67fcab04b
institution Directory Open Access Journal
issn 2079-3197
language English
last_indexed 2024-03-11T08:58:48Z
publishDate 2023-02-01
publisher MDPI AG
record_format Article
series Computation
spelling doaj.art-46d99ae025bd41dd8ba605d67fcab04b2023-11-16T19:52:56ZengMDPI AGComputation2079-31972023-02-011123210.3390/computation11020032An Improved Multilabel k-Nearest Neighbor Algorithm Based on Value and WeightZhe Wang0Hao Xu1Pan Zhou2Gang Xiao3College of Information Engineering, Zhejiang University of Technology, Hangzhou 323000, ChinaCollege of Engineering, Lishui University, Lishui 323000, ChinaCollege of Engineering, Lishui University, Lishui 323000, ChinaCollege of Information Engineering, Zhejiang University of Technology, Hangzhou 323000, ChinaMultilabel data share important features, including label imbalance, which has a significant influence on the performance of classifiers. Because of this problem, a widely used multilabel classification algorithm, the multilabel k-nearest neighbor (ML-kNN) algorithm, has poor performance on imbalanced multilabel data. To address this problem, this study proposes an improved ML-kNN algorithm based on value and weight. In this improved algorithm, labels are divided into minority and majority, and different strategies are adopted for different labels. By considering the label of latent information carried by the nearest neighbors, a value calculation method is proposed and used to directly classify majority labels. Additionally, to address the misclassification problem caused by a lack of nearest neighbor information for minority labels, weight calculation is proposed. The proposed weight calculation converts distance information with and without label sets in the nearest neighbors into weights. The experimental results on multilabel datasets from different benchmarks demonstrate the performance of the algorithm, especially for datasets with high imbalance. Different evaluation metrics show that the results are improved by approximately 2–10%. The verified algorithm could be applied to a multilabel classification of various fields involving label imbalance, such as drug molecule identification, building identification, and text categorization.https://www.mdpi.com/2079-3197/11/2/32label imbalancemultilabel classificationML-kNN
spellingShingle Zhe Wang
Hao Xu
Pan Zhou
Gang Xiao
An Improved Multilabel k-Nearest Neighbor Algorithm Based on Value and Weight
Computation
label imbalance
multilabel classification
ML-kNN
title An Improved Multilabel k-Nearest Neighbor Algorithm Based on Value and Weight
title_full An Improved Multilabel k-Nearest Neighbor Algorithm Based on Value and Weight
title_fullStr An Improved Multilabel k-Nearest Neighbor Algorithm Based on Value and Weight
title_full_unstemmed An Improved Multilabel k-Nearest Neighbor Algorithm Based on Value and Weight
title_short An Improved Multilabel k-Nearest Neighbor Algorithm Based on Value and Weight
title_sort improved multilabel k nearest neighbor algorithm based on value and weight
topic label imbalance
multilabel classification
ML-kNN
url https://www.mdpi.com/2079-3197/11/2/32
work_keys_str_mv AT zhewang animprovedmultilabelknearestneighboralgorithmbasedonvalueandweight
AT haoxu animprovedmultilabelknearestneighboralgorithmbasedonvalueandweight
AT panzhou animprovedmultilabelknearestneighboralgorithmbasedonvalueandweight
AT gangxiao animprovedmultilabelknearestneighboralgorithmbasedonvalueandweight
AT zhewang improvedmultilabelknearestneighboralgorithmbasedonvalueandweight
AT haoxu improvedmultilabelknearestneighboralgorithmbasedonvalueandweight
AT panzhou improvedmultilabelknearestneighboralgorithmbasedonvalueandweight
AT gangxiao improvedmultilabelknearestneighboralgorithmbasedonvalueandweight