A particle swarm based hybrid system for imbalanced medical data sampling
<p>Abstract</p> <p>Background</p> <p>Medical and biological data are commonly with small sample size, missing values, and most importantly, imbalanced class distribution. In this study we propose a particle swarm based hybrid system for remedying the class imbalance pro...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2009-12-01
|
Series: | BMC Genomics |
_version_ | 1819117482654302208 |
---|---|
author | Zhou Bing B Xu Liang Yang Pengyi Zhang Zili Zomaya Albert Y |
author_facet | Zhou Bing B Xu Liang Yang Pengyi Zhang Zili Zomaya Albert Y |
author_sort | Zhou Bing B |
collection | DOAJ |
description | <p>Abstract</p> <p>Background</p> <p>Medical and biological data are commonly with small sample size, missing values, and most importantly, imbalanced class distribution. In this study we propose a particle swarm based hybrid system for remedying the class imbalance problem in medical and biological data mining. This hybrid system combines the particle swarm optimization (PSO) algorithm with multiple classifiers and evaluation metrics for evaluation fusion. Samples from the majority class are ranked using multiple objectives according to their merit in compensating the class imbalance, and then combined with the minority class to form a balanced dataset.</p> <p>Results</p> <p>One important finding of this study is that different classifiers and metrics often provide different evaluation results. Nevertheless, the proposed hybrid system demonstrates consistent improvements over several alternative methods with three different metrics. The sampling results also demonstrate good generalization on different types of classification algorithms, indicating the advantage of information fusion applied in the hybrid system.</p> <p>Conclusion</p> <p>The experimental results demonstrate that unlike many currently available methods which often perform unevenly with different datasets the proposed hybrid system has a better generalization property which alleviates the method-data dependency problem. From the biological perspective, the system provides indication for further investigation of the highly ranked samples, which may result in the discovery of new conditions or disease subtypes.</p> |
first_indexed | 2024-12-22T05:33:41Z |
format | Article |
id | doaj.art-5a3c367745aa4a7fa94b362da88588a0 |
institution | Directory Open Access Journal |
issn | 1471-2164 |
language | English |
last_indexed | 2024-12-22T05:33:41Z |
publishDate | 2009-12-01 |
publisher | BMC |
record_format | Article |
series | BMC Genomics |
spelling | doaj.art-5a3c367745aa4a7fa94b362da88588a02022-12-21T18:37:23ZengBMCBMC Genomics1471-21642009-12-0110Suppl 3S3410.1186/1471-2164-10-S3-S34A particle swarm based hybrid system for imbalanced medical data samplingZhou Bing BXu LiangYang PengyiZhang ZiliZomaya Albert Y<p>Abstract</p> <p>Background</p> <p>Medical and biological data are commonly with small sample size, missing values, and most importantly, imbalanced class distribution. In this study we propose a particle swarm based hybrid system for remedying the class imbalance problem in medical and biological data mining. This hybrid system combines the particle swarm optimization (PSO) algorithm with multiple classifiers and evaluation metrics for evaluation fusion. Samples from the majority class are ranked using multiple objectives according to their merit in compensating the class imbalance, and then combined with the minority class to form a balanced dataset.</p> <p>Results</p> <p>One important finding of this study is that different classifiers and metrics often provide different evaluation results. Nevertheless, the proposed hybrid system demonstrates consistent improvements over several alternative methods with three different metrics. The sampling results also demonstrate good generalization on different types of classification algorithms, indicating the advantage of information fusion applied in the hybrid system.</p> <p>Conclusion</p> <p>The experimental results demonstrate that unlike many currently available methods which often perform unevenly with different datasets the proposed hybrid system has a better generalization property which alleviates the method-data dependency problem. From the biological perspective, the system provides indication for further investigation of the highly ranked samples, which may result in the discovery of new conditions or disease subtypes.</p> |
spellingShingle | Zhou Bing B Xu Liang Yang Pengyi Zhang Zili Zomaya Albert Y A particle swarm based hybrid system for imbalanced medical data sampling BMC Genomics |
title | A particle swarm based hybrid system for imbalanced medical data sampling |
title_full | A particle swarm based hybrid system for imbalanced medical data sampling |
title_fullStr | A particle swarm based hybrid system for imbalanced medical data sampling |
title_full_unstemmed | A particle swarm based hybrid system for imbalanced medical data sampling |
title_short | A particle swarm based hybrid system for imbalanced medical data sampling |
title_sort | particle swarm based hybrid system for imbalanced medical data sampling |
work_keys_str_mv | AT zhoubingb aparticleswarmbasedhybridsystemforimbalancedmedicaldatasampling AT xuliang aparticleswarmbasedhybridsystemforimbalancedmedicaldatasampling AT yangpengyi aparticleswarmbasedhybridsystemforimbalancedmedicaldatasampling AT zhangzili aparticleswarmbasedhybridsystemforimbalancedmedicaldatasampling AT zomayaalberty aparticleswarmbasedhybridsystemforimbalancedmedicaldatasampling AT zhoubingb particleswarmbasedhybridsystemforimbalancedmedicaldatasampling AT xuliang particleswarmbasedhybridsystemforimbalancedmedicaldatasampling AT yangpengyi particleswarmbasedhybridsystemforimbalancedmedicaldatasampling AT zhangzili particleswarmbasedhybridsystemforimbalancedmedicaldatasampling AT zomayaalberty particleswarmbasedhybridsystemforimbalancedmedicaldatasampling |