IRS-BAG-Integrated Radius-SMOTE Algorithm with Bagging Ensemble Learning Model for Imbalanced Data Set Classification

Imbalanced learning problems are a challenge faced by classifiers when data samples have an unbalanced distribution among classes. The Synthetic Minority Over-Sampling Technique (SMOTE) is one of the most well-known data pre-processing methods. Problems that arise when oversampling with SMOTE are th...

Full description

Bibliographic Details
Main Authors: Lilis Yuningsih, Gede Angga Pradipta, Dadang Hermawan, Putu Desiana Wulaning Ayu, Dandy Pramana Hostiadi, Roy Rudolf Huizen
Format: Article
Language:English
Published: Ital Publication 2023-10-01
Series:Emerging Science Journal
Subjects:
Online Access:https://www.ijournalse.org/index.php/ESJ/article/view/1758
_version_ 1827382702256422912
author Lilis Yuningsih
Gede Angga Pradipta
Dadang Hermawan
Putu Desiana Wulaning Ayu
Dandy Pramana Hostiadi
Roy Rudolf Huizen
author_facet Lilis Yuningsih
Gede Angga Pradipta
Dadang Hermawan
Putu Desiana Wulaning Ayu
Dandy Pramana Hostiadi
Roy Rudolf Huizen
author_sort Lilis Yuningsih
collection DOAJ
description Imbalanced learning problems are a challenge faced by classifiers when data samples have an unbalanced distribution among classes. The Synthetic Minority Over-Sampling Technique (SMOTE) is one of the most well-known data pre-processing methods. Problems that arise when oversampling with SMOTE are the phenomenon of noise, small disjunct samples, and overfitting due to a high imbalance ratio in a dataset. A high level of imbalance ratio and low variance conditions cause the results of synthetic data generation to be collected in narrow areas and conflicting regions among classes and make them susceptible to overfitting during the learning process by machine learning methods. Therefore, this research proposes a combination between Radius-SMOTE and Bagging Algorithm called the IRS-BAG Model. For each sub-sample generated by bootstrapping, oversampling was done using Radius SMOTE. Oversampling on the sub-sample was likely to overcome overfitting problems that might occur. Experiments were carried out by comparing the performance of the IRS-BAG model with various previous oversampling methods using the imbalanced public dataset. The experiment results using three different classifiers proved that all classifiers had gained a notable improvement when combined with the proposed IRS-BAG model compared with the previous state-of-the-art oversampling methods.   Doi: 10.28991/ESJ-2023-07-05-04 Full Text: PDF
first_indexed 2024-03-08T14:25:23Z
format Article
id doaj.art-9cf1f44b02d14c4eb4695ea8c264660e
institution Directory Open Access Journal
issn 2610-9182
language English
last_indexed 2024-03-08T14:25:23Z
publishDate 2023-10-01
publisher Ital Publication
record_format Article
series Emerging Science Journal
spelling doaj.art-9cf1f44b02d14c4eb4695ea8c264660e2024-01-13T07:27:37ZengItal PublicationEmerging Science Journal2610-91822023-10-01751501151610.28991/ESJ-2023-07-05-04540IRS-BAG-Integrated Radius-SMOTE Algorithm with Bagging Ensemble Learning Model for Imbalanced Data Set ClassificationLilis Yuningsih0Gede Angga Pradipta1Dadang Hermawan2Putu Desiana Wulaning Ayu3Dandy Pramana Hostiadi4Roy Rudolf Huizen5Department of Information System, Faculty Computer and Informatics, Institut Teknologi dan Bisnis STIKOM Bali, Denpasar 80234,Post Graduate Department of Information System, Faculty Computer and Informatics, Institut Teknologi dan Bisnis STIKOM Bali, Denpasar 80234,Department of Digital Bussines, Faculty Bussines and Vocation, Institut Teknologi dan Bisnis STIKOM Bali Denpasar 80234,Post Graduate Department of Information System, Faculty Computer and Informatics, Institut Teknologi dan Bisnis STIKOM Bali, Denpasar 80234,Post Graduate Department of Information System, Faculty Computer and Informatics, Institut Teknologi dan Bisnis STIKOM Bali, Denpasar 80234,Post Graduate Department of Information System, Faculty Computer and Informatics, Institut Teknologi dan Bisnis STIKOM Bali, Denpasar 80234,Imbalanced learning problems are a challenge faced by classifiers when data samples have an unbalanced distribution among classes. The Synthetic Minority Over-Sampling Technique (SMOTE) is one of the most well-known data pre-processing methods. Problems that arise when oversampling with SMOTE are the phenomenon of noise, small disjunct samples, and overfitting due to a high imbalance ratio in a dataset. A high level of imbalance ratio and low variance conditions cause the results of synthetic data generation to be collected in narrow areas and conflicting regions among classes and make them susceptible to overfitting during the learning process by machine learning methods. Therefore, this research proposes a combination between Radius-SMOTE and Bagging Algorithm called the IRS-BAG Model. For each sub-sample generated by bootstrapping, oversampling was done using Radius SMOTE. Oversampling on the sub-sample was likely to overcome overfitting problems that might occur. Experiments were carried out by comparing the performance of the IRS-BAG model with various previous oversampling methods using the imbalanced public dataset. The experiment results using three different classifiers proved that all classifiers had gained a notable improvement when combined with the proposed IRS-BAG model compared with the previous state-of-the-art oversampling methods.   Doi: 10.28991/ESJ-2023-07-05-04 Full Text: PDFhttps://www.ijournalse.org/index.php/ESJ/article/view/1758imbalanced dataoversamplingsmotebaggingclassificationmachine learning.
spellingShingle Lilis Yuningsih
Gede Angga Pradipta
Dadang Hermawan
Putu Desiana Wulaning Ayu
Dandy Pramana Hostiadi
Roy Rudolf Huizen
IRS-BAG-Integrated Radius-SMOTE Algorithm with Bagging Ensemble Learning Model for Imbalanced Data Set Classification
Emerging Science Journal
imbalanced data
oversampling
smote
bagging
classification
machine learning.
title IRS-BAG-Integrated Radius-SMOTE Algorithm with Bagging Ensemble Learning Model for Imbalanced Data Set Classification
title_full IRS-BAG-Integrated Radius-SMOTE Algorithm with Bagging Ensemble Learning Model for Imbalanced Data Set Classification
title_fullStr IRS-BAG-Integrated Radius-SMOTE Algorithm with Bagging Ensemble Learning Model for Imbalanced Data Set Classification
title_full_unstemmed IRS-BAG-Integrated Radius-SMOTE Algorithm with Bagging Ensemble Learning Model for Imbalanced Data Set Classification
title_short IRS-BAG-Integrated Radius-SMOTE Algorithm with Bagging Ensemble Learning Model for Imbalanced Data Set Classification
title_sort irs bag integrated radius smote algorithm with bagging ensemble learning model for imbalanced data set classification
topic imbalanced data
oversampling
smote
bagging
classification
machine learning.
url https://www.ijournalse.org/index.php/ESJ/article/view/1758
work_keys_str_mv AT lilisyuningsih irsbagintegratedradiussmotealgorithmwithbaggingensemblelearningmodelforimbalanceddatasetclassification
AT gedeanggapradipta irsbagintegratedradiussmotealgorithmwithbaggingensemblelearningmodelforimbalanceddatasetclassification
AT dadanghermawan irsbagintegratedradiussmotealgorithmwithbaggingensemblelearningmodelforimbalanceddatasetclassification
AT putudesianawulaningayu irsbagintegratedradiussmotealgorithmwithbaggingensemblelearningmodelforimbalanceddatasetclassification
AT dandypramanahostiadi irsbagintegratedradiussmotealgorithmwithbaggingensemblelearningmodelforimbalanceddatasetclassification
AT royrudolfhuizen irsbagintegratedradiussmotealgorithmwithbaggingensemblelearningmodelforimbalanceddatasetclassification