Hybrid Sampling and Dynamic Weighting-Based Classification Method for Multi-Class Imbalanced Data Stream

The imbalance and concept drift problems in data streams become more complex in multi-class environment, and extreme imbalance and variation in class ratio may also exist. To tackle the above problems, Hybrid Sampling and Dynamic Weighted-based classification method for Multi-class Imbalanced data s...

Full description

Bibliographic Details
Main Authors: Meng Han, Ang Li, Zhihui Gao, Dongliang Mu, Shujuan Liu
Format: Article
Language:English
Published: MDPI AG 2023-05-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/13/10/5924
_version_ 1797601257573056512
author Meng Han
Ang Li
Zhihui Gao
Dongliang Mu
Shujuan Liu
author_facet Meng Han
Ang Li
Zhihui Gao
Dongliang Mu
Shujuan Liu
author_sort Meng Han
collection DOAJ
description The imbalance and concept drift problems in data streams become more complex in multi-class environment, and extreme imbalance and variation in class ratio may also exist. To tackle the above problems, Hybrid Sampling and Dynamic Weighted-based classification method for Multi-class Imbalanced data stream (HSDW-MI) is proposed. The HSDW-MI algorithm deals with imbalance and concept drift problems through the hybrid sampling and dynamic weighting phases, respectively. In the hybrid sampling phase, adaptive spectral clustering is proposed to sample the data after clustering, which can maintain the original data distribution; then the sample safety factor is used to determine the samples to be sampled for each class; the safe samples are oversampled and the unsafe samples are under-sampled in each cluster. If the data stream is extremely imbalanced, the sample storage pool is used to extract samples with a high safety factor to add to the data stream. In the dynamic weighting phase, a dynamic weighting method based on the G-mean value is proposed. The G-mean values are used as the weights of each base classifier in the ensemble and the ensemble is dynamically updated during the processing of the data stream to accommodate the occurrence of concept drift. Experiments were conducted with LB, OAUE, ARF, BOLE, MUOB, MOOD, CALMID, and the proposed HSDW-MI on 10 multi-class synthetic data streams with different class ratios and concept drifts and 3 real multi-class imbalanced streams with unknown drifts, and the results show that the proposed HSDW-MI has better classification capabilities and performs more consistently compared to all other algorithms.
first_indexed 2024-03-11T03:58:35Z
format Article
id doaj.art-5c9590221bc6410da93356e35003a05a
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-11T03:58:35Z
publishDate 2023-05-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-5c9590221bc6410da93356e35003a05a2023-11-18T00:17:49ZengMDPI AGApplied Sciences2076-34172023-05-011310592410.3390/app13105924Hybrid Sampling and Dynamic Weighting-Based Classification Method for Multi-Class Imbalanced Data StreamMeng Han0Ang Li1Zhihui Gao2Dongliang Mu3Shujuan Liu4School of Computer Science and Engineering, North Minzu University, Yinchuan 750021, ChinaSchool of Computer Science and Engineering, North Minzu University, Yinchuan 750021, ChinaSchool of Computer Science and Engineering, North Minzu University, Yinchuan 750021, ChinaSchool of Computer Science and Engineering, North Minzu University, Yinchuan 750021, ChinaSchool of Computer Science and Engineering, North Minzu University, Yinchuan 750021, ChinaThe imbalance and concept drift problems in data streams become more complex in multi-class environment, and extreme imbalance and variation in class ratio may also exist. To tackle the above problems, Hybrid Sampling and Dynamic Weighted-based classification method for Multi-class Imbalanced data stream (HSDW-MI) is proposed. The HSDW-MI algorithm deals with imbalance and concept drift problems through the hybrid sampling and dynamic weighting phases, respectively. In the hybrid sampling phase, adaptive spectral clustering is proposed to sample the data after clustering, which can maintain the original data distribution; then the sample safety factor is used to determine the samples to be sampled for each class; the safe samples are oversampled and the unsafe samples are under-sampled in each cluster. If the data stream is extremely imbalanced, the sample storage pool is used to extract samples with a high safety factor to add to the data stream. In the dynamic weighting phase, a dynamic weighting method based on the G-mean value is proposed. The G-mean values are used as the weights of each base classifier in the ensemble and the ensemble is dynamically updated during the processing of the data stream to accommodate the occurrence of concept drift. Experiments were conducted with LB, OAUE, ARF, BOLE, MUOB, MOOD, CALMID, and the proposed HSDW-MI on 10 multi-class synthetic data streams with different class ratios and concept drifts and 3 real multi-class imbalanced streams with unknown drifts, and the results show that the proposed HSDW-MI has better classification capabilities and performs more consistently compared to all other algorithms.https://www.mdpi.com/2076-3417/13/10/5924data streammulti-class imbalanceconcept drifthybrid samplingclassifier weighting
spellingShingle Meng Han
Ang Li
Zhihui Gao
Dongliang Mu
Shujuan Liu
Hybrid Sampling and Dynamic Weighting-Based Classification Method for Multi-Class Imbalanced Data Stream
Applied Sciences
data stream
multi-class imbalance
concept drift
hybrid sampling
classifier weighting
title Hybrid Sampling and Dynamic Weighting-Based Classification Method for Multi-Class Imbalanced Data Stream
title_full Hybrid Sampling and Dynamic Weighting-Based Classification Method for Multi-Class Imbalanced Data Stream
title_fullStr Hybrid Sampling and Dynamic Weighting-Based Classification Method for Multi-Class Imbalanced Data Stream
title_full_unstemmed Hybrid Sampling and Dynamic Weighting-Based Classification Method for Multi-Class Imbalanced Data Stream
title_short Hybrid Sampling and Dynamic Weighting-Based Classification Method for Multi-Class Imbalanced Data Stream
title_sort hybrid sampling and dynamic weighting based classification method for multi class imbalanced data stream
topic data stream
multi-class imbalance
concept drift
hybrid sampling
classifier weighting
url https://www.mdpi.com/2076-3417/13/10/5924
work_keys_str_mv AT menghan hybridsamplinganddynamicweightingbasedclassificationmethodformulticlassimbalanceddatastream
AT angli hybridsamplinganddynamicweightingbasedclassificationmethodformulticlassimbalanceddatastream
AT zhihuigao hybridsamplinganddynamicweightingbasedclassificationmethodformulticlassimbalanceddatastream
AT dongliangmu hybridsamplinganddynamicweightingbasedclassificationmethodformulticlassimbalanceddatastream
AT shujuanliu hybridsamplinganddynamicweightingbasedclassificationmethodformulticlassimbalanceddatastream