HS-Gen: a hypersphere-constrained generation mechanism to improve synthetic minority oversampling for imbalanced classification
Abstract Mitigating the impact of class-imbalance data on classifiers is a challenging task in machine learning. SMOTE is a well-known method to tackle this task by modifying class distribution and generating synthetic instances. However, most of the SMOTE-based methods focus on the phase of data se...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Springer
2022-12-01
|
Series: | Complex & Intelligent Systems |
Subjects: | |
Online Access: | https://doi.org/10.1007/s40747-022-00938-9 |
_version_ | 1797769211000389632 |
---|---|
author | Zuowei He Jiaqing Tao Qiangkui Leng Junchang Zhai Changzhong Wang |
author_facet | Zuowei He Jiaqing Tao Qiangkui Leng Junchang Zhai Changzhong Wang |
author_sort | Zuowei He |
collection | DOAJ |
description | Abstract Mitigating the impact of class-imbalance data on classifiers is a challenging task in machine learning. SMOTE is a well-known method to tackle this task by modifying class distribution and generating synthetic instances. However, most of the SMOTE-based methods focus on the phase of data selection, while few consider the phase of data generation. This paper proposes a hypersphere-constrained generation mechanism (HS-Gen) to improve synthetic minority oversampling. Unlike linear interpolation commonly used in SMOTE-based methods, HS-Gen generates a minority instance in a hypersphere rather than on a straight line. This mechanism expands the distribution range of minority instances with significant randomness and diversity. Furthermore, HS-Gen is attached with a noise prevention strategy that adaptively shrinks the hypersphere by determining whether new instances fall into the majority class region. HS-Gen can be regarded as an oversampling optimization mechanism and flexibly embedded into the SMOTE-based methods. We conduct comparative experiments by embedding HS-Gen into the original SMOTE, Borderline-SMOTE, ADASYN, k-means SMOTE, and RSMOTE. Experimental results show that the embedded versions can generate higher quality synthetic instances than the original ones. Moreover, on these oversampled datasets, the conventional classifiers (C4.5 and Adaboost) obtain significant performance improvement in terms of F1 measure and G-mean. |
first_indexed | 2024-03-12T21:05:37Z |
format | Article |
id | doaj.art-0790f8b7a9864043a955297155f91f51 |
institution | Directory Open Access Journal |
issn | 2199-4536 2198-6053 |
language | English |
last_indexed | 2024-03-12T21:05:37Z |
publishDate | 2022-12-01 |
publisher | Springer |
record_format | Article |
series | Complex & Intelligent Systems |
spelling | doaj.art-0790f8b7a9864043a955297155f91f512023-07-30T11:28:08ZengSpringerComplex & Intelligent Systems2199-45362198-60532022-12-01943971398810.1007/s40747-022-00938-9HS-Gen: a hypersphere-constrained generation mechanism to improve synthetic minority oversampling for imbalanced classificationZuowei He0Jiaqing Tao1Qiangkui Leng2Junchang Zhai3Changzhong Wang4College of Information Science and Technology, Bohai UniversityCollege of Information Science and Technology, Bohai UniversitySchool of Electronics and Information Engineering, Liaoning Technical UniversityCollege of Information Science and Technology, Bohai UniversityCollege of Mathematical Sciences, Bohai UniversityAbstract Mitigating the impact of class-imbalance data on classifiers is a challenging task in machine learning. SMOTE is a well-known method to tackle this task by modifying class distribution and generating synthetic instances. However, most of the SMOTE-based methods focus on the phase of data selection, while few consider the phase of data generation. This paper proposes a hypersphere-constrained generation mechanism (HS-Gen) to improve synthetic minority oversampling. Unlike linear interpolation commonly used in SMOTE-based methods, HS-Gen generates a minority instance in a hypersphere rather than on a straight line. This mechanism expands the distribution range of minority instances with significant randomness and diversity. Furthermore, HS-Gen is attached with a noise prevention strategy that adaptively shrinks the hypersphere by determining whether new instances fall into the majority class region. HS-Gen can be regarded as an oversampling optimization mechanism and flexibly embedded into the SMOTE-based methods. We conduct comparative experiments by embedding HS-Gen into the original SMOTE, Borderline-SMOTE, ADASYN, k-means SMOTE, and RSMOTE. Experimental results show that the embedded versions can generate higher quality synthetic instances than the original ones. Moreover, on these oversampled datasets, the conventional classifiers (C4.5 and Adaboost) obtain significant performance improvement in terms of F1 measure and G-mean.https://doi.org/10.1007/s40747-022-00938-9Imbalanced classificationSynthetic oversamplingSMOTEGeneration mechanismHypersphere constraint |
spellingShingle | Zuowei He Jiaqing Tao Qiangkui Leng Junchang Zhai Changzhong Wang HS-Gen: a hypersphere-constrained generation mechanism to improve synthetic minority oversampling for imbalanced classification Complex & Intelligent Systems Imbalanced classification Synthetic oversampling SMOTE Generation mechanism Hypersphere constraint |
title | HS-Gen: a hypersphere-constrained generation mechanism to improve synthetic minority oversampling for imbalanced classification |
title_full | HS-Gen: a hypersphere-constrained generation mechanism to improve synthetic minority oversampling for imbalanced classification |
title_fullStr | HS-Gen: a hypersphere-constrained generation mechanism to improve synthetic minority oversampling for imbalanced classification |
title_full_unstemmed | HS-Gen: a hypersphere-constrained generation mechanism to improve synthetic minority oversampling for imbalanced classification |
title_short | HS-Gen: a hypersphere-constrained generation mechanism to improve synthetic minority oversampling for imbalanced classification |
title_sort | hs gen a hypersphere constrained generation mechanism to improve synthetic minority oversampling for imbalanced classification |
topic | Imbalanced classification Synthetic oversampling SMOTE Generation mechanism Hypersphere constraint |
url | https://doi.org/10.1007/s40747-022-00938-9 |
work_keys_str_mv | AT zuoweihe hsgenahypersphereconstrainedgenerationmechanismtoimprovesyntheticminorityoversamplingforimbalancedclassification AT jiaqingtao hsgenahypersphereconstrainedgenerationmechanismtoimprovesyntheticminorityoversamplingforimbalancedclassification AT qiangkuileng hsgenahypersphereconstrainedgenerationmechanismtoimprovesyntheticminorityoversamplingforimbalancedclassification AT junchangzhai hsgenahypersphereconstrainedgenerationmechanismtoimprovesyntheticminorityoversamplingforimbalancedclassification AT changzhongwang hsgenahypersphereconstrainedgenerationmechanismtoimprovesyntheticminorityoversamplingforimbalancedclassification |