An Improved Multilabel k-Nearest Neighbor Algorithm Based on Value and Weight
Multilabel data share important features, including label imbalance, which has a significant influence on the performance of classifiers. Because of this problem, a widely used multilabel classification algorithm, the multilabel k-nearest neighbor (ML-kNN) algorithm, has poor performance on imbalanc...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-02-01
|
Series: | Computation |
Subjects: | |
Online Access: | https://www.mdpi.com/2079-3197/11/2/32 |
_version_ | 1827758028524355584 |
---|---|
author | Zhe Wang Hao Xu Pan Zhou Gang Xiao |
author_facet | Zhe Wang Hao Xu Pan Zhou Gang Xiao |
author_sort | Zhe Wang |
collection | DOAJ |
description | Multilabel data share important features, including label imbalance, which has a significant influence on the performance of classifiers. Because of this problem, a widely used multilabel classification algorithm, the multilabel k-nearest neighbor (ML-kNN) algorithm, has poor performance on imbalanced multilabel data. To address this problem, this study proposes an improved ML-kNN algorithm based on value and weight. In this improved algorithm, labels are divided into minority and majority, and different strategies are adopted for different labels. By considering the label of latent information carried by the nearest neighbors, a value calculation method is proposed and used to directly classify majority labels. Additionally, to address the misclassification problem caused by a lack of nearest neighbor information for minority labels, weight calculation is proposed. The proposed weight calculation converts distance information with and without label sets in the nearest neighbors into weights. The experimental results on multilabel datasets from different benchmarks demonstrate the performance of the algorithm, especially for datasets with high imbalance. Different evaluation metrics show that the results are improved by approximately 2–10%. The verified algorithm could be applied to a multilabel classification of various fields involving label imbalance, such as drug molecule identification, building identification, and text categorization. |
first_indexed | 2024-03-11T08:58:48Z |
format | Article |
id | doaj.art-46d99ae025bd41dd8ba605d67fcab04b |
institution | Directory Open Access Journal |
issn | 2079-3197 |
language | English |
last_indexed | 2024-03-11T08:58:48Z |
publishDate | 2023-02-01 |
publisher | MDPI AG |
record_format | Article |
series | Computation |
spelling | doaj.art-46d99ae025bd41dd8ba605d67fcab04b2023-11-16T19:52:56ZengMDPI AGComputation2079-31972023-02-011123210.3390/computation11020032An Improved Multilabel k-Nearest Neighbor Algorithm Based on Value and WeightZhe Wang0Hao Xu1Pan Zhou2Gang Xiao3College of Information Engineering, Zhejiang University of Technology, Hangzhou 323000, ChinaCollege of Engineering, Lishui University, Lishui 323000, ChinaCollege of Engineering, Lishui University, Lishui 323000, ChinaCollege of Information Engineering, Zhejiang University of Technology, Hangzhou 323000, ChinaMultilabel data share important features, including label imbalance, which has a significant influence on the performance of classifiers. Because of this problem, a widely used multilabel classification algorithm, the multilabel k-nearest neighbor (ML-kNN) algorithm, has poor performance on imbalanced multilabel data. To address this problem, this study proposes an improved ML-kNN algorithm based on value and weight. In this improved algorithm, labels are divided into minority and majority, and different strategies are adopted for different labels. By considering the label of latent information carried by the nearest neighbors, a value calculation method is proposed and used to directly classify majority labels. Additionally, to address the misclassification problem caused by a lack of nearest neighbor information for minority labels, weight calculation is proposed. The proposed weight calculation converts distance information with and without label sets in the nearest neighbors into weights. The experimental results on multilabel datasets from different benchmarks demonstrate the performance of the algorithm, especially for datasets with high imbalance. Different evaluation metrics show that the results are improved by approximately 2–10%. The verified algorithm could be applied to a multilabel classification of various fields involving label imbalance, such as drug molecule identification, building identification, and text categorization.https://www.mdpi.com/2079-3197/11/2/32label imbalancemultilabel classificationML-kNN |
spellingShingle | Zhe Wang Hao Xu Pan Zhou Gang Xiao An Improved Multilabel k-Nearest Neighbor Algorithm Based on Value and Weight Computation label imbalance multilabel classification ML-kNN |
title | An Improved Multilabel k-Nearest Neighbor Algorithm Based on Value and Weight |
title_full | An Improved Multilabel k-Nearest Neighbor Algorithm Based on Value and Weight |
title_fullStr | An Improved Multilabel k-Nearest Neighbor Algorithm Based on Value and Weight |
title_full_unstemmed | An Improved Multilabel k-Nearest Neighbor Algorithm Based on Value and Weight |
title_short | An Improved Multilabel k-Nearest Neighbor Algorithm Based on Value and Weight |
title_sort | improved multilabel k nearest neighbor algorithm based on value and weight |
topic | label imbalance multilabel classification ML-kNN |
url | https://www.mdpi.com/2079-3197/11/2/32 |
work_keys_str_mv | AT zhewang animprovedmultilabelknearestneighboralgorithmbasedonvalueandweight AT haoxu animprovedmultilabelknearestneighboralgorithmbasedonvalueandweight AT panzhou animprovedmultilabelknearestneighboralgorithmbasedonvalueandweight AT gangxiao animprovedmultilabelknearestneighboralgorithmbasedonvalueandweight AT zhewang improvedmultilabelknearestneighboralgorithmbasedonvalueandweight AT haoxu improvedmultilabelknearestneighboralgorithmbasedonvalueandweight AT panzhou improvedmultilabelknearestneighboralgorithmbasedonvalueandweight AT gangxiao improvedmultilabelknearestneighboralgorithmbasedonvalueandweight |