Improved support vector machine classification for imbalanced medical datasets by novel hybrid sampling combining modified mega-trend-diffusion and bagging extreme learning machine model

To handle imbalanced datasets in machine learning or deep learning models, some studies suggest sampling techniques to generate virtual examples of minority classes to improve the models' prediction accuracy. However, for kernel-based support vector machines (SVM), some sampling methods suggest...

Full description

Bibliographic Details
Main Authors: Liang-Sian Lin, Chen-Huan Kao, Yi-Jie Li, Hao-Hsuan Chen, Hung-Yu Chen
Format: Article
Language:English
Published: AIMS Press 2023-09-01
Series:Mathematical Biosciences and Engineering
Subjects:
Online Access:https://www.aimspress.com/article/doi/10.3934/mbe.2023786?viewType=HTML
_version_ 1797661484621234176
author Liang-Sian Lin
Chen-Huan Kao
Yi-Jie Li
Hao-Hsuan Chen
Hung-Yu Chen
author_facet Liang-Sian Lin
Chen-Huan Kao
Yi-Jie Li
Hao-Hsuan Chen
Hung-Yu Chen
author_sort Liang-Sian Lin
collection DOAJ
description To handle imbalanced datasets in machine learning or deep learning models, some studies suggest sampling techniques to generate virtual examples of minority classes to improve the models' prediction accuracy. However, for kernel-based support vector machines (SVM), some sampling methods suggest generating synthetic examples in an original data space rather than in a high-dimensional feature space. This may be ineffective in improving SVM classification for imbalanced datasets. To address this problem, we propose a novel hybrid sampling technique termed modified mega-trend-diffusion-extreme learning machine (MMTD-ELM) to effectively move the SVM decision boundary toward a region of the majority class. By this movement, the prediction of SVM for minority class examples can be improved. The proposed method combines α-cut fuzzy number method for screening representative examples of majority class and MMTD method for creating new examples of the minority class. Furthermore, we construct a bagging ELM model to monitor the similarity between new examples and original data. In this paper, four datasets are used to test the efficiency of the proposed MMTD-ELM method in imbalanced data prediction. Additionally, we deployed two SVM models to compare prediction performance of the proposed MMTD-ELM method with three state-of-the-art sampling techniques in terms of geometric mean (G-mean), F-measure (F1), index of balanced accuracy (IBA) and area under curve (AUC) metrics. Furthermore, paired t-test is used to elucidate whether the suggested method has statistically significant differences from the other sampling techniques in terms of the four evaluation metrics. The experimental results demonstrated that the proposed method achieves the best average values in terms of G-mean, F1, IBA and AUC. Overall, the suggested MMTD-ELM method outperforms these sampling methods for imbalanced datasets.
first_indexed 2024-03-11T18:46:10Z
format Article
id doaj.art-efe284450da34588952261493e606225
institution Directory Open Access Journal
issn 1551-0018
language English
last_indexed 2024-03-11T18:46:10Z
publishDate 2023-09-01
publisher AIMS Press
record_format Article
series Mathematical Biosciences and Engineering
spelling doaj.art-efe284450da34588952261493e6062252023-10-12T01:24:23ZengAIMS PressMathematical Biosciences and Engineering1551-00182023-09-012010176721770110.3934/mbe.2023786Improved support vector machine classification for imbalanced medical datasets by novel hybrid sampling combining modified mega-trend-diffusion and bagging extreme learning machine modelLiang-Sian Lin0Chen-Huan Kao 1Yi-Jie Li 2Hao-Hsuan Chen 3Hung-Yu Chen 41. Department of Information Management, National Taipei University of Nursing and Health Sciences, Taipei 112303, Taiwan1. Department of Information Management, National Taipei University of Nursing and Health Sciences, Taipei 112303, Taiwan1. Department of Information Management, National Taipei University of Nursing and Health Sciences, Taipei 112303, Taiwan1. Department of Information Management, National Taipei University of Nursing and Health Sciences, Taipei 112303, Taiwan2. Department of Information Management, National Chin-Yi University of Technology, Taichung 411030, TaiwanTo handle imbalanced datasets in machine learning or deep learning models, some studies suggest sampling techniques to generate virtual examples of minority classes to improve the models' prediction accuracy. However, for kernel-based support vector machines (SVM), some sampling methods suggest generating synthetic examples in an original data space rather than in a high-dimensional feature space. This may be ineffective in improving SVM classification for imbalanced datasets. To address this problem, we propose a novel hybrid sampling technique termed modified mega-trend-diffusion-extreme learning machine (MMTD-ELM) to effectively move the SVM decision boundary toward a region of the majority class. By this movement, the prediction of SVM for minority class examples can be improved. The proposed method combines α-cut fuzzy number method for screening representative examples of majority class and MMTD method for creating new examples of the minority class. Furthermore, we construct a bagging ELM model to monitor the similarity between new examples and original data. In this paper, four datasets are used to test the efficiency of the proposed MMTD-ELM method in imbalanced data prediction. Additionally, we deployed two SVM models to compare prediction performance of the proposed MMTD-ELM method with three state-of-the-art sampling techniques in terms of geometric mean (G-mean), F-measure (F1), index of balanced accuracy (IBA) and area under curve (AUC) metrics. Furthermore, paired t-test is used to elucidate whether the suggested method has statistically significant differences from the other sampling techniques in terms of the four evaluation metrics. The experimental results demonstrated that the proposed method achieves the best average values in terms of G-mean, F1, IBA and AUC. Overall, the suggested MMTD-ELM method outperforms these sampling methods for imbalanced datasets.https://www.aimspress.com/article/doi/10.3934/mbe.2023786?viewType=HTMLimbalanced datasetshybrid sampling approachsupport vectorsvirtual examples
spellingShingle Liang-Sian Lin
Chen-Huan Kao
Yi-Jie Li
Hao-Hsuan Chen
Hung-Yu Chen
Improved support vector machine classification for imbalanced medical datasets by novel hybrid sampling combining modified mega-trend-diffusion and bagging extreme learning machine model
Mathematical Biosciences and Engineering
imbalanced datasets
hybrid sampling approach
support vectors
virtual examples
title Improved support vector machine classification for imbalanced medical datasets by novel hybrid sampling combining modified mega-trend-diffusion and bagging extreme learning machine model
title_full Improved support vector machine classification for imbalanced medical datasets by novel hybrid sampling combining modified mega-trend-diffusion and bagging extreme learning machine model
title_fullStr Improved support vector machine classification for imbalanced medical datasets by novel hybrid sampling combining modified mega-trend-diffusion and bagging extreme learning machine model
title_full_unstemmed Improved support vector machine classification for imbalanced medical datasets by novel hybrid sampling combining modified mega-trend-diffusion and bagging extreme learning machine model
title_short Improved support vector machine classification for imbalanced medical datasets by novel hybrid sampling combining modified mega-trend-diffusion and bagging extreme learning machine model
title_sort improved support vector machine classification for imbalanced medical datasets by novel hybrid sampling combining modified mega trend diffusion and bagging extreme learning machine model
topic imbalanced datasets
hybrid sampling approach
support vectors
virtual examples
url https://www.aimspress.com/article/doi/10.3934/mbe.2023786?viewType=HTML
work_keys_str_mv AT liangsianlin improvedsupportvectormachineclassificationforimbalancedmedicaldatasetsbynovelhybridsamplingcombiningmodifiedmegatrenddiffusionandbaggingextremelearningmachinemodel
AT chenhuankao improvedsupportvectormachineclassificationforimbalancedmedicaldatasetsbynovelhybridsamplingcombiningmodifiedmegatrenddiffusionandbaggingextremelearningmachinemodel
AT yijieli improvedsupportvectormachineclassificationforimbalancedmedicaldatasetsbynovelhybridsamplingcombiningmodifiedmegatrenddiffusionandbaggingextremelearningmachinemodel
AT haohsuanchen improvedsupportvectormachineclassificationforimbalancedmedicaldatasetsbynovelhybridsamplingcombiningmodifiedmegatrenddiffusionandbaggingextremelearningmachinemodel
AT hungyuchen improvedsupportvectormachineclassificationforimbalancedmedicaldatasetsbynovelhybridsamplingcombiningmodifiedmegatrenddiffusionandbaggingextremelearningmachinemodel