Improved support vector machine classification for imbalanced medical datasets by novel hybrid sampling combining modified mega-trend-diffusion and bagging extreme learning machine model
To handle imbalanced datasets in machine learning or deep learning models, some studies suggest sampling techniques to generate virtual examples of minority classes to improve the models' prediction accuracy. However, for kernel-based support vector machines (SVM), some sampling methods suggest...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
AIMS Press
2023-09-01
|
Series: | Mathematical Biosciences and Engineering |
Subjects: | |
Online Access: | https://www.aimspress.com/article/doi/10.3934/mbe.2023786?viewType=HTML |
_version_ | 1797661484621234176 |
---|---|
author | Liang-Sian Lin Chen-Huan Kao Yi-Jie Li Hao-Hsuan Chen Hung-Yu Chen |
author_facet | Liang-Sian Lin Chen-Huan Kao Yi-Jie Li Hao-Hsuan Chen Hung-Yu Chen |
author_sort | Liang-Sian Lin |
collection | DOAJ |
description | To handle imbalanced datasets in machine learning or deep learning models, some studies suggest sampling techniques to generate virtual examples of minority classes to improve the models' prediction accuracy. However, for kernel-based support vector machines (SVM), some sampling methods suggest generating synthetic examples in an original data space rather than in a high-dimensional feature space. This may be ineffective in improving SVM classification for imbalanced datasets. To address this problem, we propose a novel hybrid sampling technique termed modified mega-trend-diffusion-extreme learning machine (MMTD-ELM) to effectively move the SVM decision boundary toward a region of the majority class. By this movement, the prediction of SVM for minority class examples can be improved. The proposed method combines α-cut fuzzy number method for screening representative examples of majority class and MMTD method for creating new examples of the minority class. Furthermore, we construct a bagging ELM model to monitor the similarity between new examples and original data. In this paper, four datasets are used to test the efficiency of the proposed MMTD-ELM method in imbalanced data prediction. Additionally, we deployed two SVM models to compare prediction performance of the proposed MMTD-ELM method with three state-of-the-art sampling techniques in terms of geometric mean (G-mean), F-measure (F1), index of balanced accuracy (IBA) and area under curve (AUC) metrics. Furthermore, paired t-test is used to elucidate whether the suggested method has statistically significant differences from the other sampling techniques in terms of the four evaluation metrics. The experimental results demonstrated that the proposed method achieves the best average values in terms of G-mean, F1, IBA and AUC. Overall, the suggested MMTD-ELM method outperforms these sampling methods for imbalanced datasets. |
first_indexed | 2024-03-11T18:46:10Z |
format | Article |
id | doaj.art-efe284450da34588952261493e606225 |
institution | Directory Open Access Journal |
issn | 1551-0018 |
language | English |
last_indexed | 2024-03-11T18:46:10Z |
publishDate | 2023-09-01 |
publisher | AIMS Press |
record_format | Article |
series | Mathematical Biosciences and Engineering |
spelling | doaj.art-efe284450da34588952261493e6062252023-10-12T01:24:23ZengAIMS PressMathematical Biosciences and Engineering1551-00182023-09-012010176721770110.3934/mbe.2023786Improved support vector machine classification for imbalanced medical datasets by novel hybrid sampling combining modified mega-trend-diffusion and bagging extreme learning machine modelLiang-Sian Lin0Chen-Huan Kao 1Yi-Jie Li 2Hao-Hsuan Chen 3Hung-Yu Chen 41. Department of Information Management, National Taipei University of Nursing and Health Sciences, Taipei 112303, Taiwan1. Department of Information Management, National Taipei University of Nursing and Health Sciences, Taipei 112303, Taiwan1. Department of Information Management, National Taipei University of Nursing and Health Sciences, Taipei 112303, Taiwan1. Department of Information Management, National Taipei University of Nursing and Health Sciences, Taipei 112303, Taiwan2. Department of Information Management, National Chin-Yi University of Technology, Taichung 411030, TaiwanTo handle imbalanced datasets in machine learning or deep learning models, some studies suggest sampling techniques to generate virtual examples of minority classes to improve the models' prediction accuracy. However, for kernel-based support vector machines (SVM), some sampling methods suggest generating synthetic examples in an original data space rather than in a high-dimensional feature space. This may be ineffective in improving SVM classification for imbalanced datasets. To address this problem, we propose a novel hybrid sampling technique termed modified mega-trend-diffusion-extreme learning machine (MMTD-ELM) to effectively move the SVM decision boundary toward a region of the majority class. By this movement, the prediction of SVM for minority class examples can be improved. The proposed method combines α-cut fuzzy number method for screening representative examples of majority class and MMTD method for creating new examples of the minority class. Furthermore, we construct a bagging ELM model to monitor the similarity between new examples and original data. In this paper, four datasets are used to test the efficiency of the proposed MMTD-ELM method in imbalanced data prediction. Additionally, we deployed two SVM models to compare prediction performance of the proposed MMTD-ELM method with three state-of-the-art sampling techniques in terms of geometric mean (G-mean), F-measure (F1), index of balanced accuracy (IBA) and area under curve (AUC) metrics. Furthermore, paired t-test is used to elucidate whether the suggested method has statistically significant differences from the other sampling techniques in terms of the four evaluation metrics. The experimental results demonstrated that the proposed method achieves the best average values in terms of G-mean, F1, IBA and AUC. Overall, the suggested MMTD-ELM method outperforms these sampling methods for imbalanced datasets.https://www.aimspress.com/article/doi/10.3934/mbe.2023786?viewType=HTMLimbalanced datasetshybrid sampling approachsupport vectorsvirtual examples |
spellingShingle | Liang-Sian Lin Chen-Huan Kao Yi-Jie Li Hao-Hsuan Chen Hung-Yu Chen Improved support vector machine classification for imbalanced medical datasets by novel hybrid sampling combining modified mega-trend-diffusion and bagging extreme learning machine model Mathematical Biosciences and Engineering imbalanced datasets hybrid sampling approach support vectors virtual examples |
title | Improved support vector machine classification for imbalanced medical datasets by novel hybrid sampling combining modified mega-trend-diffusion and bagging extreme learning machine model |
title_full | Improved support vector machine classification for imbalanced medical datasets by novel hybrid sampling combining modified mega-trend-diffusion and bagging extreme learning machine model |
title_fullStr | Improved support vector machine classification for imbalanced medical datasets by novel hybrid sampling combining modified mega-trend-diffusion and bagging extreme learning machine model |
title_full_unstemmed | Improved support vector machine classification for imbalanced medical datasets by novel hybrid sampling combining modified mega-trend-diffusion and bagging extreme learning machine model |
title_short | Improved support vector machine classification for imbalanced medical datasets by novel hybrid sampling combining modified mega-trend-diffusion and bagging extreme learning machine model |
title_sort | improved support vector machine classification for imbalanced medical datasets by novel hybrid sampling combining modified mega trend diffusion and bagging extreme learning machine model |
topic | imbalanced datasets hybrid sampling approach support vectors virtual examples |
url | https://www.aimspress.com/article/doi/10.3934/mbe.2023786?viewType=HTML |
work_keys_str_mv | AT liangsianlin improvedsupportvectormachineclassificationforimbalancedmedicaldatasetsbynovelhybridsamplingcombiningmodifiedmegatrenddiffusionandbaggingextremelearningmachinemodel AT chenhuankao improvedsupportvectormachineclassificationforimbalancedmedicaldatasetsbynovelhybridsamplingcombiningmodifiedmegatrenddiffusionandbaggingextremelearningmachinemodel AT yijieli improvedsupportvectormachineclassificationforimbalancedmedicaldatasetsbynovelhybridsamplingcombiningmodifiedmegatrenddiffusionandbaggingextremelearningmachinemodel AT haohsuanchen improvedsupportvectormachineclassificationforimbalancedmedicaldatasetsbynovelhybridsamplingcombiningmodifiedmegatrenddiffusionandbaggingextremelearningmachinemodel AT hungyuchen improvedsupportvectormachineclassificationforimbalancedmedicaldatasetsbynovelhybridsamplingcombiningmodifiedmegatrenddiffusionandbaggingextremelearningmachinemodel |