Radius-SMOTE: A New Oversampling Technique of Minority Samples Based on Radius Distance for Learning from Imbalanced Data

Imbalanced learning problems are a challenge faced by classifiers when data samples have an unbalanced distribution in each class. Furthermore, the synthetic oversampling method (SMOTE) is a preprocessing technique widely used to synthesize new data and balance the different numbers of samples in ea...

Full description

Bibliographic Details
Main Authors: Pradipta, G.A., Wardoyo, R., Musdholifah, A., Sanjaya, I.N.H.
Format: Article
Published: Institute of Electrical and Electronics Engineers Inc. 2021
Subjects:
_version_ 1826050422287106048
author Pradipta, G.A.
Wardoyo, R.
Musdholifah, A.
Sanjaya, I.N.H.
author_facet Pradipta, G.A.
Wardoyo, R.
Musdholifah, A.
Sanjaya, I.N.H.
author_sort Pradipta, G.A.
collection UGM
description Imbalanced learning problems are a challenge faced by classifiers when data samples have an unbalanced distribution in each class. Furthermore, the synthetic oversampling method (SMOTE) is a preprocessing technique widely used to synthesize new data and balance the different numbers of samples in each class. One of the SMOTE method's expansions is based on the initial selection approach, which determines the best candidates to be oversampled in the data before the process of synthetic example generation starts. However, SMOTE and most of the existing oversampling methods based on initial selection still found overlapping data on the final result. This issue makes it difficult for any classifiers to determine the decision boundary of each class. Therefore, this research proposes a new oversampling technique called Radius-SMOTE, which emphasizes the initial selection approach by creating synthetic data based on a safe radius distance. Furthermore, new synthetic data are prevented from overlapping in the opposite class with the safe radius distance. The Radius-SMOTE was evaluated extensively with thirteen artificial imbalanced datasets from the KEEL repository. The experimental results show that the proposed method is able to achieve the best results on 5 datasets, namely yeast-1-4-5-8vs7, ecoli-0-1-3-7vs2-6, Umbilical cord, Pima, and Haberman dataset in term of various assessment metrics. Besides that, the computational cost for our proposed method is also relatively low, with an average time of 0.5 to 1 second on the 13 tested datasets. © 2013 IEEE.
first_indexed 2024-03-14T00:03:43Z
format Article
id oai:generic.eprints.org:280330
institution Universiti Gadjah Mada
last_indexed 2024-03-14T00:03:43Z
publishDate 2021
publisher Institute of Electrical and Electronics Engineers Inc.
record_format dspace
spelling oai:generic.eprints.org:2803302023-11-10T05:52:05Z https://repository.ugm.ac.id/280330/ Radius-SMOTE: A New Oversampling Technique of Minority Samples Based on Radius Distance for Learning from Imbalanced Data Pradipta, G.A. Wardoyo, R. Musdholifah, A. Sanjaya, I.N.H. Structural Geology Imbalanced learning problems are a challenge faced by classifiers when data samples have an unbalanced distribution in each class. Furthermore, the synthetic oversampling method (SMOTE) is a preprocessing technique widely used to synthesize new data and balance the different numbers of samples in each class. One of the SMOTE method's expansions is based on the initial selection approach, which determines the best candidates to be oversampled in the data before the process of synthetic example generation starts. However, SMOTE and most of the existing oversampling methods based on initial selection still found overlapping data on the final result. This issue makes it difficult for any classifiers to determine the decision boundary of each class. Therefore, this research proposes a new oversampling technique called Radius-SMOTE, which emphasizes the initial selection approach by creating synthetic data based on a safe radius distance. Furthermore, new synthetic data are prevented from overlapping in the opposite class with the safe radius distance. The Radius-SMOTE was evaluated extensively with thirteen artificial imbalanced datasets from the KEEL repository. The experimental results show that the proposed method is able to achieve the best results on 5 datasets, namely yeast-1-4-5-8vs7, ecoli-0-1-3-7vs2-6, Umbilical cord, Pima, and Haberman dataset in term of various assessment metrics. Besides that, the computational cost for our proposed method is also relatively low, with an average time of 0.5 to 1 second on the 13 tested datasets. © 2013 IEEE. Institute of Electrical and Electronics Engineers Inc. 2021 Article PeerReviewed Pradipta, G.A. and Wardoyo, R. and Musdholifah, A. and Sanjaya, I.N.H. (2021) Radius-SMOTE: A New Oversampling Technique of Minority Samples Based on Radius Distance for Learning from Imbalanced Data. IEEE Access, 9. pp. 74763-74777. https://www.scopus.com/inward/record.uri?eid=2-s2.0-85105844830&doi=10.1109%2fACCESS.2021.3080316&partnerID=40&md5=7e77620750be71b98b44fa9206f1ed38
spellingShingle Structural Geology
Pradipta, G.A.
Wardoyo, R.
Musdholifah, A.
Sanjaya, I.N.H.
Radius-SMOTE: A New Oversampling Technique of Minority Samples Based on Radius Distance for Learning from Imbalanced Data
title Radius-SMOTE: A New Oversampling Technique of Minority Samples Based on Radius Distance for Learning from Imbalanced Data
title_full Radius-SMOTE: A New Oversampling Technique of Minority Samples Based on Radius Distance for Learning from Imbalanced Data
title_fullStr Radius-SMOTE: A New Oversampling Technique of Minority Samples Based on Radius Distance for Learning from Imbalanced Data
title_full_unstemmed Radius-SMOTE: A New Oversampling Technique of Minority Samples Based on Radius Distance for Learning from Imbalanced Data
title_short Radius-SMOTE: A New Oversampling Technique of Minority Samples Based on Radius Distance for Learning from Imbalanced Data
title_sort radius smote a new oversampling technique of minority samples based on radius distance for learning from imbalanced data
topic Structural Geology
work_keys_str_mv AT pradiptaga radiussmoteanewoversamplingtechniqueofminoritysamplesbasedonradiusdistanceforlearningfromimbalanceddata
AT wardoyor radiussmoteanewoversamplingtechniqueofminoritysamplesbasedonradiusdistanceforlearningfromimbalanceddata
AT musdholifaha radiussmoteanewoversamplingtechniqueofminoritysamplesbasedonradiusdistanceforlearningfromimbalanceddata
AT sanjayainh radiussmoteanewoversamplingtechniqueofminoritysamplesbasedonradiusdistanceforlearningfromimbalanceddata