Selecting Relevant Genes with a Spectral Approach

Array technologies have made it possible to record simultaneouslythe expression pattern of thousands of genes. A fundamental problemin the analysis of gene expression data is the identification ofhighly relevant genes that either discriminate between phenotypiclabels or are important with respect to...

Full description

Bibliographic Details
Main Authors:	Wolf, Lior, Shashua, Amnon, Mukherjee, Sayan
Language:	en_US
Published:	2005
Subjects:	AI
Online Access:	http://hdl.handle.net/1721.1/30444

_version_	1811095908812587008
author	Wolf, Lior Shashua, Amnon Mukherjee, Sayan
author_facet	Wolf, Lior Shashua, Amnon Mukherjee, Sayan
author_sort	Wolf, Lior
collection	MIT
description	Array technologies have made it possible to record simultaneouslythe expression pattern of thousands of genes. A fundamental problemin the analysis of gene expression data is the identification ofhighly relevant genes that either discriminate between phenotypiclabels or are important with respect to the cellular process studied inthe experiment: for example cell cycle or heat shock in yeast experiments,chemical or genetic perturbations of mammalian cell lines,and genes involved in class discovery for human tumors. In this paperwe focus on the task of unsupervised gene selection. The problemof selecting a small subset of genes is particularly challengingas the datasets involved are typically characterized by a very smallsample size Â in the order of few tens of tissue samples Â andby a very large feature space as the number of genes tend to bein the high thousands. We propose a model independent approachwhich scores candidate gene selections using spectral properties ofthe candidate affinity matrix. The algorithm is very straightforwardto implement yet contains a number of remarkable properties whichguarantee consistent sparse selections. To illustrate the value of ourapproach we applied our algorithm on five different datasets. Thefirst consists of time course data from four well studied Hematopoieticcell lines (HL-60, Jurkat, NB4, and U937). The other fourdatasets include three well studied treatment outcomes (large celllymphoma, childhood medulloblastomas, breast tumors) and oneunpublished dataset (lymph status). We compared our approachboth with other unsupervised methods (SOM,PCA,GS) and withsupervised methods (SNR,RMB,RFE). The results clearly showthat our approach considerably outperforms all the other unsupervisedapproaches in our study, is competitive with supervised methodsand in some case even outperforms supervised approaches.
first_indexed	2024-09-23T16:33:32Z
id	mit-1721.1/30444
institution	Massachusetts Institute of Technology
language	en_US
last_indexed	2024-09-23T16:33:32Z
publishDate	2005
record_format	dspace
spelling	mit-1721.1/304442019-04-10T16:37:20Z Selecting Relevant Genes with a Spectral Approach Wolf, Lior Shashua, Amnon Mukherjee, Sayan AI Array technologies have made it possible to record simultaneouslythe expression pattern of thousands of genes. A fundamental problemin the analysis of gene expression data is the identification ofhighly relevant genes that either discriminate between phenotypiclabels or are important with respect to the cellular process studied inthe experiment: for example cell cycle or heat shock in yeast experiments,chemical or genetic perturbations of mammalian cell lines,and genes involved in class discovery for human tumors. In this paperwe focus on the task of unsupervised gene selection. The problemof selecting a small subset of genes is particularly challengingas the datasets involved are typically characterized by a very smallsample size Â in the order of few tens of tissue samples Â andby a very large feature space as the number of genes tend to bein the high thousands. We propose a model independent approachwhich scores candidate gene selections using spectral properties ofthe candidate affinity matrix. The algorithm is very straightforwardto implement yet contains a number of remarkable properties whichguarantee consistent sparse selections. To illustrate the value of ourapproach we applied our algorithm on five different datasets. Thefirst consists of time course data from four well studied Hematopoieticcell lines (HL-60, Jurkat, NB4, and U937). The other fourdatasets include three well studied treatment outcomes (large celllymphoma, childhood medulloblastomas, breast tumors) and oneunpublished dataset (lymph status). We compared our approachboth with other unsupervised methods (SOM,PCA,GS) and withsupervised methods (SNR,RMB,RFE). The results clearly showthat our approach considerably outperforms all the other unsupervisedapproaches in our study, is competitive with supervised methodsand in some case even outperforms supervised approaches. 2005-12-22T01:19:04Z 2005-12-22T01:19:04Z 2004-01-27 MIT-CSAIL-TR-2004-003 AIM-2004-002 CBCL-234 http://hdl.handle.net/1721.1/30444 en_US Massachusetts Institute of Technology Computer Science and Artificial Intelligence Laboratory 0 p. 12089662 bytes 629163 bytes application/postscript application/pdf application/postscript application/pdf
spellingShingle	AI Wolf, Lior Shashua, Amnon Mukherjee, Sayan Selecting Relevant Genes with a Spectral Approach
title	Selecting Relevant Genes with a Spectral Approach
title_full	Selecting Relevant Genes with a Spectral Approach
title_fullStr	Selecting Relevant Genes with a Spectral Approach
title_full_unstemmed	Selecting Relevant Genes with a Spectral Approach
title_short	Selecting Relevant Genes with a Spectral Approach
title_sort	selecting relevant genes with a spectral approach
topic	AI
url	http://hdl.handle.net/1721.1/30444
work_keys_str_mv	AT wolflior selectingrelevantgeneswithaspectralapproach AT shashuaamnon selectingrelevantgeneswithaspectralapproach AT mukherjeesayan selectingrelevantgeneswithaspectralapproach

Selecting Relevant Genes with a Spectral Approach

Similar Items