A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification

Abstract Background Cancer diagnosis and clinical outcome prediction are among the most important emerging applications of gene expression microarray technology with several molecular signatures on their way toward clinical deployment. Use of the most a...

Full description

Bibliographic Details
Main Authors:	Wang Lily, Statnikov Alexander, Aliferis Constantin F
Format:	Article
Language:	English
Published:	BMC 2008-07-01
Series:	BMC Bioinformatics
Online Access:	http://www.biomedcentral.com/1471-2105/9/319

_version_	1818369993023160320
author	Wang Lily Statnikov Alexander Aliferis Constantin F
author_facet	Wang Lily Statnikov Alexander Aliferis Constantin F
author_sort	Wang Lily
collection	DOAJ
description	<p>Abstract</p> <p>Background</p> <p>Cancer diagnosis and clinical outcome prediction are among the most important emerging applications of gene expression microarray technology with several molecular signatures on their way toward clinical deployment. Use of the most accurate classification algorithms available for microarray gene expression data is a critical ingredient in order to develop the best possible molecular signatures for patient care. As suggested by a large body of literature to date, support vector machines can be considered "best of class" algorithms for classification of such data. Recent work, however, suggests that random forest classifiers may outperform support vector machines in this domain.</p> <p>Results</p> <p>In the present paper we identify methodological biases of prior work comparing random forests and support vector machines and conduct a new rigorous evaluation of the two algorithms that corrects these limitations. Our experiments use 22 diagnostic and prognostic datasets and show that support vector machines outperform random forests, often by a large margin. Our data also underlines the importance of sound research design in benchmarking and comparison of bioinformatics algorithms.</p> <p>Conclusion</p> <p>We found that both on average and in the majority of microarray datasets, random forests are outperformed by support vector machines both in the settings when no gene selection is performed and when several popular gene selection methods are used.</p>
first_indexed	2024-12-13T23:32:40Z
format	Article
id	doaj.art-2aa43a2a90b64f0daa24302ca1164c1f
institution	Directory Open Access Journal
issn	1471-2105
language	English
last_indexed	2024-12-13T23:32:40Z
publishDate	2008-07-01
publisher	BMC
record_format	Article
series	BMC Bioinformatics
spelling	doaj.art-2aa43a2a90b64f0daa24302ca1164c1f2022-12-21T23:27:23ZengBMCBMC Bioinformatics1471-21052008-07-019131910.1186/1471-2105-9-319A comprehensive comparison of random forests and support vector machines for microarray-based cancer classificationWang LilyStatnikov AlexanderAliferis Constantin F<p>Abstract</p> <p>Background</p> <p>Cancer diagnosis and clinical outcome prediction are among the most important emerging applications of gene expression microarray technology with several molecular signatures on their way toward clinical deployment. Use of the most accurate classification algorithms available for microarray gene expression data is a critical ingredient in order to develop the best possible molecular signatures for patient care. As suggested by a large body of literature to date, support vector machines can be considered "best of class" algorithms for classification of such data. Recent work, however, suggests that random forest classifiers may outperform support vector machines in this domain.</p> <p>Results</p> <p>In the present paper we identify methodological biases of prior work comparing random forests and support vector machines and conduct a new rigorous evaluation of the two algorithms that corrects these limitations. Our experiments use 22 diagnostic and prognostic datasets and show that support vector machines outperform random forests, often by a large margin. Our data also underlines the importance of sound research design in benchmarking and comparison of bioinformatics algorithms.</p> <p>Conclusion</p> <p>We found that both on average and in the majority of microarray datasets, random forests are outperformed by support vector machines both in the settings when no gene selection is performed and when several popular gene selection methods are used.</p>http://www.biomedcentral.com/1471-2105/9/319
spellingShingle	Wang Lily Statnikov Alexander Aliferis Constantin F A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification BMC Bioinformatics
title	A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification
title_full	A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification
title_fullStr	A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification
title_full_unstemmed	A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification
title_short	A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification
title_sort	comprehensive comparison of random forests and support vector machines for microarray based cancer classification
url	http://www.biomedcentral.com/1471-2105/9/319
work_keys_str_mv	AT wanglily acomprehensivecomparisonofrandomforestsandsupportvectormachinesformicroarraybasedcancerclassification AT statnikovalexander acomprehensivecomparisonofrandomforestsandsupportvectormachinesformicroarraybasedcancerclassification AT aliferisconstantinf acomprehensivecomparisonofrandomforestsandsupportvectormachinesformicroarraybasedcancerclassification AT wanglily comprehensivecomparisonofrandomforestsandsupportvectormachinesformicroarraybasedcancerclassification AT statnikovalexander comprehensivecomparisonofrandomforestsandsupportvectormachinesformicroarraybasedcancerclassification AT aliferisconstantinf comprehensivecomparisonofrandomforestsandsupportvectormachinesformicroarraybasedcancerclassification

A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification

Similar Items