The Impact of Partial Balance of Imbalanced Dataset on Classification Performance
The imbalance of network data seriously affects the classification performance of algorithms. Most studies have only used a rough description of data imbalance with less exploration of the specific factors affecting classification performance, which has resulted in difficulty putting forward targete...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-04-01
|
Series: | Electronics |
Subjects: | |
Online Access: | https://www.mdpi.com/2079-9292/11/9/1322 |
_version_ | 1797505164118065152 |
---|---|
author | Qing Li Chang Zhao Xintai He Kun Chen Runze Wang |
author_facet | Qing Li Chang Zhao Xintai He Kun Chen Runze Wang |
author_sort | Qing Li |
collection | DOAJ |
description | The imbalance of network data seriously affects the classification performance of algorithms. Most studies have only used a rough description of data imbalance with less exploration of the specific factors affecting classification performance, which has resulted in difficulty putting forward targeted solutions. In this paper, we find that the impact of medium categories on classification performance cannot be ignored, and therefore propose the concept of partial balance, consisting of Class Number of Partial Balance (β) and Balance Degree of Partial Samples (μ). Combined with Global Slope (α), a parameterized model is established to describe the difference of imbalanced datasets. Experiments are performed on the Moore Dataset and CICIDS 2017 Dataset. The experiment’s results on Random Forest, Decision Tree and Deep Neural Network show increasing <b>α</b> is a conducive step in the performance improvement of minority classes and overall classes. When <b>β</b> of dominant categories increases, that of inferior classes decreases, which results in a decrease in the average performance of minority classes. The lower <b>μ</b> is, the closer the sample size of medium classes is to the minority classes, and the better the average performance is. Based on the conclusions, we propose and verify some basic strategies by various classical algorithms. |
first_indexed | 2024-03-10T04:14:42Z |
format | Article |
id | doaj.art-12fdf00ef428425f9140c004ba088b36 |
institution | Directory Open Access Journal |
issn | 2079-9292 |
language | English |
last_indexed | 2024-03-10T04:14:42Z |
publishDate | 2022-04-01 |
publisher | MDPI AG |
record_format | Article |
series | Electronics |
spelling | doaj.art-12fdf00ef428425f9140c004ba088b362023-11-23T08:01:56ZengMDPI AGElectronics2079-92922022-04-01119132210.3390/electronics11091322The Impact of Partial Balance of Imbalanced Dataset on Classification PerformanceQing Li0Chang Zhao1Xintai He2Kun Chen3Runze Wang4Department of Information System Engineering, PLA Strategic Support Force Information Engineering University, Zhengzhou 450001, ChinaDepartment of Information System Engineering, PLA Strategic Support Force Information Engineering University, Zhengzhou 450001, ChinaDepartment of Information System Engineering, PLA Strategic Support Force Information Engineering University, Zhengzhou 450001, ChinaDepartment of Information System Engineering, PLA Strategic Support Force Information Engineering University, Zhengzhou 450001, ChinaDepartment of Information System Engineering, PLA Strategic Support Force Information Engineering University, Zhengzhou 450001, ChinaThe imbalance of network data seriously affects the classification performance of algorithms. Most studies have only used a rough description of data imbalance with less exploration of the specific factors affecting classification performance, which has resulted in difficulty putting forward targeted solutions. In this paper, we find that the impact of medium categories on classification performance cannot be ignored, and therefore propose the concept of partial balance, consisting of Class Number of Partial Balance (β) and Balance Degree of Partial Samples (μ). Combined with Global Slope (α), a parameterized model is established to describe the difference of imbalanced datasets. Experiments are performed on the Moore Dataset and CICIDS 2017 Dataset. The experiment’s results on Random Forest, Decision Tree and Deep Neural Network show increasing <b>α</b> is a conducive step in the performance improvement of minority classes and overall classes. When <b>β</b> of dominant categories increases, that of inferior classes decreases, which results in a decrease in the average performance of minority classes. The lower <b>μ</b> is, the closer the sample size of medium classes is to the minority classes, and the better the average performance is. Based on the conclusions, we propose and verify some basic strategies by various classical algorithms.https://www.mdpi.com/2079-9292/11/9/1322network traffic classificationdata imbalanceimbalance degreeminority classpartial balance |
spellingShingle | Qing Li Chang Zhao Xintai He Kun Chen Runze Wang The Impact of Partial Balance of Imbalanced Dataset on Classification Performance Electronics network traffic classification data imbalance imbalance degree minority class partial balance |
title | The Impact of Partial Balance of Imbalanced Dataset on Classification Performance |
title_full | The Impact of Partial Balance of Imbalanced Dataset on Classification Performance |
title_fullStr | The Impact of Partial Balance of Imbalanced Dataset on Classification Performance |
title_full_unstemmed | The Impact of Partial Balance of Imbalanced Dataset on Classification Performance |
title_short | The Impact of Partial Balance of Imbalanced Dataset on Classification Performance |
title_sort | impact of partial balance of imbalanced dataset on classification performance |
topic | network traffic classification data imbalance imbalance degree minority class partial balance |
url | https://www.mdpi.com/2079-9292/11/9/1322 |
work_keys_str_mv | AT qingli theimpactofpartialbalanceofimbalanceddatasetonclassificationperformance AT changzhao theimpactofpartialbalanceofimbalanceddatasetonclassificationperformance AT xintaihe theimpactofpartialbalanceofimbalanceddatasetonclassificationperformance AT kunchen theimpactofpartialbalanceofimbalanceddatasetonclassificationperformance AT runzewang theimpactofpartialbalanceofimbalanceddatasetonclassificationperformance AT qingli impactofpartialbalanceofimbalanceddatasetonclassificationperformance AT changzhao impactofpartialbalanceofimbalanceddatasetonclassificationperformance AT xintaihe impactofpartialbalanceofimbalanceddatasetonclassificationperformance AT kunchen impactofpartialbalanceofimbalanceddatasetonclassificationperformance AT runzewang impactofpartialbalanceofimbalanceddatasetonclassificationperformance |