HS-Gen: a hypersphere-constrained generation mechanism to improve synthetic minority oversampling for imbalanced classification

Abstract Mitigating the impact of class-imbalance data on classifiers is a challenging task in machine learning. SMOTE is a well-known method to tackle this task by modifying class distribution and generating synthetic instances. However, most of the SMOTE-based methods focus on the phase of data se...

Full description

Bibliographic Details
Main Authors: Zuowei He, Jiaqing Tao, Qiangkui Leng, Junchang Zhai, Changzhong Wang
Format: Article
Language:English
Published: Springer 2022-12-01
Series:Complex & Intelligent Systems
Subjects:
Online Access:https://doi.org/10.1007/s40747-022-00938-9
_version_ 1797769211000389632
author Zuowei He
Jiaqing Tao
Qiangkui Leng
Junchang Zhai
Changzhong Wang
author_facet Zuowei He
Jiaqing Tao
Qiangkui Leng
Junchang Zhai
Changzhong Wang
author_sort Zuowei He
collection DOAJ
description Abstract Mitigating the impact of class-imbalance data on classifiers is a challenging task in machine learning. SMOTE is a well-known method to tackle this task by modifying class distribution and generating synthetic instances. However, most of the SMOTE-based methods focus on the phase of data selection, while few consider the phase of data generation. This paper proposes a hypersphere-constrained generation mechanism (HS-Gen) to improve synthetic minority oversampling. Unlike linear interpolation commonly used in SMOTE-based methods, HS-Gen generates a minority instance in a hypersphere rather than on a straight line. This mechanism expands the distribution range of minority instances with significant randomness and diversity. Furthermore, HS-Gen is attached with a noise prevention strategy that adaptively shrinks the hypersphere by determining whether new instances fall into the majority class region. HS-Gen can be regarded as an oversampling optimization mechanism and flexibly embedded into the SMOTE-based methods. We conduct comparative experiments by embedding HS-Gen into the original SMOTE, Borderline-SMOTE, ADASYN, k-means SMOTE, and RSMOTE. Experimental results show that the embedded versions can generate higher quality synthetic instances than the original ones. Moreover, on these oversampled datasets, the conventional classifiers (C4.5 and Adaboost) obtain significant performance improvement in terms of F1 measure and G-mean.
first_indexed 2024-03-12T21:05:37Z
format Article
id doaj.art-0790f8b7a9864043a955297155f91f51
institution Directory Open Access Journal
issn 2199-4536
2198-6053
language English
last_indexed 2024-03-12T21:05:37Z
publishDate 2022-12-01
publisher Springer
record_format Article
series Complex & Intelligent Systems
spelling doaj.art-0790f8b7a9864043a955297155f91f512023-07-30T11:28:08ZengSpringerComplex & Intelligent Systems2199-45362198-60532022-12-01943971398810.1007/s40747-022-00938-9HS-Gen: a hypersphere-constrained generation mechanism to improve synthetic minority oversampling for imbalanced classificationZuowei He0Jiaqing Tao1Qiangkui Leng2Junchang Zhai3Changzhong Wang4College of Information Science and Technology, Bohai UniversityCollege of Information Science and Technology, Bohai UniversitySchool of Electronics and Information Engineering, Liaoning Technical UniversityCollege of Information Science and Technology, Bohai UniversityCollege of Mathematical Sciences, Bohai UniversityAbstract Mitigating the impact of class-imbalance data on classifiers is a challenging task in machine learning. SMOTE is a well-known method to tackle this task by modifying class distribution and generating synthetic instances. However, most of the SMOTE-based methods focus on the phase of data selection, while few consider the phase of data generation. This paper proposes a hypersphere-constrained generation mechanism (HS-Gen) to improve synthetic minority oversampling. Unlike linear interpolation commonly used in SMOTE-based methods, HS-Gen generates a minority instance in a hypersphere rather than on a straight line. This mechanism expands the distribution range of minority instances with significant randomness and diversity. Furthermore, HS-Gen is attached with a noise prevention strategy that adaptively shrinks the hypersphere by determining whether new instances fall into the majority class region. HS-Gen can be regarded as an oversampling optimization mechanism and flexibly embedded into the SMOTE-based methods. We conduct comparative experiments by embedding HS-Gen into the original SMOTE, Borderline-SMOTE, ADASYN, k-means SMOTE, and RSMOTE. Experimental results show that the embedded versions can generate higher quality synthetic instances than the original ones. Moreover, on these oversampled datasets, the conventional classifiers (C4.5 and Adaboost) obtain significant performance improvement in terms of F1 measure and G-mean.https://doi.org/10.1007/s40747-022-00938-9Imbalanced classificationSynthetic oversamplingSMOTEGeneration mechanismHypersphere constraint
spellingShingle Zuowei He
Jiaqing Tao
Qiangkui Leng
Junchang Zhai
Changzhong Wang
HS-Gen: a hypersphere-constrained generation mechanism to improve synthetic minority oversampling for imbalanced classification
Complex & Intelligent Systems
Imbalanced classification
Synthetic oversampling
SMOTE
Generation mechanism
Hypersphere constraint
title HS-Gen: a hypersphere-constrained generation mechanism to improve synthetic minority oversampling for imbalanced classification
title_full HS-Gen: a hypersphere-constrained generation mechanism to improve synthetic minority oversampling for imbalanced classification
title_fullStr HS-Gen: a hypersphere-constrained generation mechanism to improve synthetic minority oversampling for imbalanced classification
title_full_unstemmed HS-Gen: a hypersphere-constrained generation mechanism to improve synthetic minority oversampling for imbalanced classification
title_short HS-Gen: a hypersphere-constrained generation mechanism to improve synthetic minority oversampling for imbalanced classification
title_sort hs gen a hypersphere constrained generation mechanism to improve synthetic minority oversampling for imbalanced classification
topic Imbalanced classification
Synthetic oversampling
SMOTE
Generation mechanism
Hypersphere constraint
url https://doi.org/10.1007/s40747-022-00938-9
work_keys_str_mv AT zuoweihe hsgenahypersphereconstrainedgenerationmechanismtoimprovesyntheticminorityoversamplingforimbalancedclassification
AT jiaqingtao hsgenahypersphereconstrainedgenerationmechanismtoimprovesyntheticminorityoversamplingforimbalancedclassification
AT qiangkuileng hsgenahypersphereconstrainedgenerationmechanismtoimprovesyntheticminorityoversamplingforimbalancedclassification
AT junchangzhai hsgenahypersphereconstrainedgenerationmechanismtoimprovesyntheticminorityoversamplingforimbalancedclassification
AT changzhongwang hsgenahypersphereconstrainedgenerationmechanismtoimprovesyntheticminorityoversamplingforimbalancedclassification