Entropy-based gene ranking without selection bias for the predictive classification of microarray data

Abstract Background We describe the E-RFE method for gene ranking, which is useful for the identification of markers in the predictive classification of array data. The method supports a practical modeling scheme designed to avoid the construction of cl...

Full description

Bibliographic Details
Main Authors:	Serafini Maria, Furlanello Cesare, Merler Stefano, Jurman Giuseppe
Format:	Article
Language:	English
Published:	BMC 2003-11-01
Series:	BMC Bioinformatics
Online Access:	http://www.biomedcentral.com/1471-2105/4/54

_version_	1828744987253145600
author	Serafini Maria Furlanello Cesare Merler Stefano Jurman Giuseppe
author_facet	Serafini Maria Furlanello Cesare Merler Stefano Jurman Giuseppe
author_sort	Serafini Maria
collection	DOAJ
description	<p>Abstract</p> <p>Background</p> <p>We describe the E-RFE method for gene ranking, which is useful for the identification of markers in the predictive classification of array data. The method supports a practical modeling scheme designed to avoid the construction of classification rules based on the selection of too small gene subsets (an effect known as the selection bias, in which the estimated predictive errors are too optimistic due to testing on samples already considered in the feature selection process).</p> <p>Results</p> <p>With E-RFE, we speed up the recursive feature elimination (RFE) with SVM classifiers by eliminating chunks of uninteresting genes using an entropy measure of the SVM weights distribution. An optimal subset of genes is selected according to a two-strata model evaluation procedure: modeling is replicated by an external stratified-partition resampling scheme, and, within each run, an internal K-fold cross-validation is used for E-RFE ranking. Also, the optimal number of genes can be estimated according to the saturation of Zipf's law profiles.</p> <p>Conclusions</p> <p>Without a decrease of classification accuracy, E-RFE allows a speed-up factor of 100 with respect to standard RFE, while improving on alternative parametric RFE reduction strategies. Thus, a process for gene selection and error estimation is made practical, ensuring control of the selection bias, and providing additional diagnostic indicators of gene importance.</p>
first_indexed	2024-04-14T03:54:12Z
format	Article
id	doaj.art-3598abb28f1a4366a007c8b087f25bdf
institution	Directory Open Access Journal
issn	1471-2105
language	English
last_indexed	2024-04-14T03:54:12Z
publishDate	2003-11-01
publisher	BMC
record_format	Article
series	BMC Bioinformatics
spelling	doaj.art-3598abb28f1a4366a007c8b087f25bdf2022-12-22T02:13:52ZengBMCBMC Bioinformatics1471-21052003-11-01415410.1186/1471-2105-4-54Entropy-based gene ranking without selection bias for the predictive classification of microarray dataSerafini MariaFurlanello CesareMerler StefanoJurman Giuseppe<p>Abstract</p> <p>Background</p> <p>We describe the E-RFE method for gene ranking, which is useful for the identification of markers in the predictive classification of array data. The method supports a practical modeling scheme designed to avoid the construction of classification rules based on the selection of too small gene subsets (an effect known as the selection bias, in which the estimated predictive errors are too optimistic due to testing on samples already considered in the feature selection process).</p> <p>Results</p> <p>With E-RFE, we speed up the recursive feature elimination (RFE) with SVM classifiers by eliminating chunks of uninteresting genes using an entropy measure of the SVM weights distribution. An optimal subset of genes is selected according to a two-strata model evaluation procedure: modeling is replicated by an external stratified-partition resampling scheme, and, within each run, an internal K-fold cross-validation is used for E-RFE ranking. Also, the optimal number of genes can be estimated according to the saturation of Zipf's law profiles.</p> <p>Conclusions</p> <p>Without a decrease of classification accuracy, E-RFE allows a speed-up factor of 100 with respect to standard RFE, while improving on alternative parametric RFE reduction strategies. Thus, a process for gene selection and error estimation is made practical, ensuring control of the selection bias, and providing additional diagnostic indicators of gene importance.</p>http://www.biomedcentral.com/1471-2105/4/54
spellingShingle	Serafini Maria Furlanello Cesare Merler Stefano Jurman Giuseppe Entropy-based gene ranking without selection bias for the predictive classification of microarray data BMC Bioinformatics
title	Entropy-based gene ranking without selection bias for the predictive classification of microarray data
title_full	Entropy-based gene ranking without selection bias for the predictive classification of microarray data
title_fullStr	Entropy-based gene ranking without selection bias for the predictive classification of microarray data
title_full_unstemmed	Entropy-based gene ranking without selection bias for the predictive classification of microarray data
title_short	Entropy-based gene ranking without selection bias for the predictive classification of microarray data
title_sort	entropy based gene ranking without selection bias for the predictive classification of microarray data
url	http://www.biomedcentral.com/1471-2105/4/54
work_keys_str_mv	AT serafinimaria entropybasedgenerankingwithoutselectionbiasforthepredictiveclassificationofmicroarraydata AT furlanellocesare entropybasedgenerankingwithoutselectionbiasforthepredictiveclassificationofmicroarraydata AT merlerstefano entropybasedgenerankingwithoutselectionbiasforthepredictiveclassificationofmicroarraydata AT jurmangiuseppe entropybasedgenerankingwithoutselectionbiasforthepredictiveclassificationofmicroarraydata

Entropy-based gene ranking without selection bias for the predictive classification of microarray data

Similar Items