A particle swarm based hybrid system for imbalanced medical data sampling

<p>Abstract</p> <p>Background</p> <p>Medical and biological data are commonly with small sample size, missing values, and most importantly, imbalanced class distribution. In this study we propose a particle swarm based hybrid system for remedying the class imbalance pro...

Full description

Bibliographic Details
Main Authors: Zhou Bing B, Xu Liang, Yang Pengyi, Zhang Zili, Zomaya Albert Y
Format: Article
Language:English
Published: BMC 2009-12-01
Series:BMC Genomics
_version_ 1819117482654302208
author Zhou Bing B
Xu Liang
Yang Pengyi
Zhang Zili
Zomaya Albert Y
author_facet Zhou Bing B
Xu Liang
Yang Pengyi
Zhang Zili
Zomaya Albert Y
author_sort Zhou Bing B
collection DOAJ
description <p>Abstract</p> <p>Background</p> <p>Medical and biological data are commonly with small sample size, missing values, and most importantly, imbalanced class distribution. In this study we propose a particle swarm based hybrid system for remedying the class imbalance problem in medical and biological data mining. This hybrid system combines the particle swarm optimization (PSO) algorithm with multiple classifiers and evaluation metrics for evaluation fusion. Samples from the majority class are ranked using multiple objectives according to their merit in compensating the class imbalance, and then combined with the minority class to form a balanced dataset.</p> <p>Results</p> <p>One important finding of this study is that different classifiers and metrics often provide different evaluation results. Nevertheless, the proposed hybrid system demonstrates consistent improvements over several alternative methods with three different metrics. The sampling results also demonstrate good generalization on different types of classification algorithms, indicating the advantage of information fusion applied in the hybrid system.</p> <p>Conclusion</p> <p>The experimental results demonstrate that unlike many currently available methods which often perform unevenly with different datasets the proposed hybrid system has a better generalization property which alleviates the method-data dependency problem. From the biological perspective, the system provides indication for further investigation of the highly ranked samples, which may result in the discovery of new conditions or disease subtypes.</p>
first_indexed 2024-12-22T05:33:41Z
format Article
id doaj.art-5a3c367745aa4a7fa94b362da88588a0
institution Directory Open Access Journal
issn 1471-2164
language English
last_indexed 2024-12-22T05:33:41Z
publishDate 2009-12-01
publisher BMC
record_format Article
series BMC Genomics
spelling doaj.art-5a3c367745aa4a7fa94b362da88588a02022-12-21T18:37:23ZengBMCBMC Genomics1471-21642009-12-0110Suppl 3S3410.1186/1471-2164-10-S3-S34A particle swarm based hybrid system for imbalanced medical data samplingZhou Bing BXu LiangYang PengyiZhang ZiliZomaya Albert Y<p>Abstract</p> <p>Background</p> <p>Medical and biological data are commonly with small sample size, missing values, and most importantly, imbalanced class distribution. In this study we propose a particle swarm based hybrid system for remedying the class imbalance problem in medical and biological data mining. This hybrid system combines the particle swarm optimization (PSO) algorithm with multiple classifiers and evaluation metrics for evaluation fusion. Samples from the majority class are ranked using multiple objectives according to their merit in compensating the class imbalance, and then combined with the minority class to form a balanced dataset.</p> <p>Results</p> <p>One important finding of this study is that different classifiers and metrics often provide different evaluation results. Nevertheless, the proposed hybrid system demonstrates consistent improvements over several alternative methods with three different metrics. The sampling results also demonstrate good generalization on different types of classification algorithms, indicating the advantage of information fusion applied in the hybrid system.</p> <p>Conclusion</p> <p>The experimental results demonstrate that unlike many currently available methods which often perform unevenly with different datasets the proposed hybrid system has a better generalization property which alleviates the method-data dependency problem. From the biological perspective, the system provides indication for further investigation of the highly ranked samples, which may result in the discovery of new conditions or disease subtypes.</p>
spellingShingle Zhou Bing B
Xu Liang
Yang Pengyi
Zhang Zili
Zomaya Albert Y
A particle swarm based hybrid system for imbalanced medical data sampling
BMC Genomics
title A particle swarm based hybrid system for imbalanced medical data sampling
title_full A particle swarm based hybrid system for imbalanced medical data sampling
title_fullStr A particle swarm based hybrid system for imbalanced medical data sampling
title_full_unstemmed A particle swarm based hybrid system for imbalanced medical data sampling
title_short A particle swarm based hybrid system for imbalanced medical data sampling
title_sort particle swarm based hybrid system for imbalanced medical data sampling
work_keys_str_mv AT zhoubingb aparticleswarmbasedhybridsystemforimbalancedmedicaldatasampling
AT xuliang aparticleswarmbasedhybridsystemforimbalancedmedicaldatasampling
AT yangpengyi aparticleswarmbasedhybridsystemforimbalancedmedicaldatasampling
AT zhangzili aparticleswarmbasedhybridsystemforimbalancedmedicaldatasampling
AT zomayaalberty aparticleswarmbasedhybridsystemforimbalancedmedicaldatasampling
AT zhoubingb particleswarmbasedhybridsystemforimbalancedmedicaldatasampling
AT xuliang particleswarmbasedhybridsystemforimbalancedmedicaldatasampling
AT yangpengyi particleswarmbasedhybridsystemforimbalancedmedicaldatasampling
AT zhangzili particleswarmbasedhybridsystemforimbalancedmedicaldatasampling
AT zomayaalberty particleswarmbasedhybridsystemforimbalancedmedicaldatasampling