An Empirical Study of Univariate and Genetic Algorithm-Based Feature Selection in Binary Classification with Microarray Data

Background: We consider both univariate- and multivariate-based feature selection for the problem of binary classification with microarray data. The idea is to determine whether the more sophisticated multivariate approach leads to better misclassification error rates because of the potential to con...

Full description

Bibliographic Details
Main Authors: Michael Lecocke, Kenneth Hess
Format: Article
Language:English
Published: SAGE Publishing 2006-01-01
Series:Cancer Informatics
Subjects:
Online Access:http://la-press.com/article.php?article_id=94
_version_ 1818286913807712256
author Michael Lecocke
Kenneth Hess
author_facet Michael Lecocke
Kenneth Hess
author_sort Michael Lecocke
collection DOAJ
description Background: We consider both univariate- and multivariate-based feature selection for the problem of binary classification with microarray data. The idea is to determine whether the more sophisticated multivariate approach leads to better misclassification error rates because of the potential to consider jointly significant subsets of genes (but without overfitting the data).Methods: We present an empirical study in which 10-fold cross-validation is applied externally to both a univariate-based and two multivariate- (genetic algorithm (GA)-) based feature selection processes. These procedures are applied with respect to three supervised learning algorithms and six published two-class microarray datasets.Results: Considering all datasets, and learning algorithms, the average 10-fold external cross-validation error rates for the univariate-, single-stage GA- , and two-stage GA-based processes are 14.2%, 14.6%, and 14.2%, respectively. We also find that the optimism bias estimates from the GA analyses were half that of the univariate approach, but the selection bias estimates from the GA analyses were 2.5 times that of the univariate results.Conclusions: We find that the 10-fold external cross-validation misclassification error rates were very comparable. Further, we find that a two-stage GA approach did not demonstrate a significant advantage over a 1-stage approach. We also find that the univariate approach had higher optimism bias and lower selection bias compared to both GA approaches.
first_indexed 2024-12-13T01:32:09Z
format Article
id doaj.art-b95046ebc10149b2aed1254946f7279c
institution Directory Open Access Journal
issn 1176-9351
language English
last_indexed 2024-12-13T01:32:09Z
publishDate 2006-01-01
publisher SAGE Publishing
record_format Article
series Cancer Informatics
spelling doaj.art-b95046ebc10149b2aed1254946f7279c2022-12-22T00:03:59ZengSAGE PublishingCancer Informatics1176-93512006-01-012313327An Empirical Study of Univariate and Genetic Algorithm-Based Feature Selection in Binary Classification with Microarray DataMichael LecockeKenneth HessBackground: We consider both univariate- and multivariate-based feature selection for the problem of binary classification with microarray data. The idea is to determine whether the more sophisticated multivariate approach leads to better misclassification error rates because of the potential to consider jointly significant subsets of genes (but without overfitting the data).Methods: We present an empirical study in which 10-fold cross-validation is applied externally to both a univariate-based and two multivariate- (genetic algorithm (GA)-) based feature selection processes. These procedures are applied with respect to three supervised learning algorithms and six published two-class microarray datasets.Results: Considering all datasets, and learning algorithms, the average 10-fold external cross-validation error rates for the univariate-, single-stage GA- , and two-stage GA-based processes are 14.2%, 14.6%, and 14.2%, respectively. We also find that the optimism bias estimates from the GA analyses were half that of the univariate approach, but the selection bias estimates from the GA analyses were 2.5 times that of the univariate results.Conclusions: We find that the 10-fold external cross-validation misclassification error rates were very comparable. Further, we find that a two-stage GA approach did not demonstrate a significant advantage over a 1-stage approach. We also find that the univariate approach had higher optimism bias and lower selection bias compared to both GA approaches.http://la-press.com/article.php?article_id=94cross-validationfeature selectionsupervised-learninggenetic algorithm
spellingShingle Michael Lecocke
Kenneth Hess
An Empirical Study of Univariate and Genetic Algorithm-Based Feature Selection in Binary Classification with Microarray Data
Cancer Informatics
cross-validation
feature selection
supervised-learning
genetic algorithm
title An Empirical Study of Univariate and Genetic Algorithm-Based Feature Selection in Binary Classification with Microarray Data
title_full An Empirical Study of Univariate and Genetic Algorithm-Based Feature Selection in Binary Classification with Microarray Data
title_fullStr An Empirical Study of Univariate and Genetic Algorithm-Based Feature Selection in Binary Classification with Microarray Data
title_full_unstemmed An Empirical Study of Univariate and Genetic Algorithm-Based Feature Selection in Binary Classification with Microarray Data
title_short An Empirical Study of Univariate and Genetic Algorithm-Based Feature Selection in Binary Classification with Microarray Data
title_sort empirical study of univariate and genetic algorithm based feature selection in binary classification with microarray data
topic cross-validation
feature selection
supervised-learning
genetic algorithm
url http://la-press.com/article.php?article_id=94
work_keys_str_mv AT michaellecocke anempiricalstudyofunivariateandgeneticalgorithmbasedfeatureselectioninbinaryclassificationwithmicroarraydata
AT kennethhess anempiricalstudyofunivariateandgeneticalgorithmbasedfeatureselectioninbinaryclassificationwithmicroarraydata
AT michaellecocke empiricalstudyofunivariateandgeneticalgorithmbasedfeatureselectioninbinaryclassificationwithmicroarraydata
AT kennethhess empiricalstudyofunivariateandgeneticalgorithmbasedfeatureselectioninbinaryclassificationwithmicroarraydata