Selecting Relevant Genes with a Spectral Approach

Array technologies have made it possible to record simultaneously the expression pattern of thousands of genes. A fundamental problem in the analysis of gene expression data is the identification of highly relevant genes that either discriminate between phenotypic labels or are important with respec...

Full description

Bibliographic Details
Main Authors:	Wolf, Lior, Amnon Shashua, Mukherjee, Sayan
Language:	en_US
Published:	2004
Subjects:	AI
Online Access:	http://hdl.handle.net/1721.1/7282

_version_	1826197608582873088
author	Wolf, Lior Amnon Shashua, Mukherjee, Sayan
author_facet	Wolf, Lior Amnon Shashua, Mukherjee, Sayan
author_sort	Wolf, Lior
collection	MIT
description	Array technologies have made it possible to record simultaneously the expression pattern of thousands of genes. A fundamental problem in the analysis of gene expression data is the identification of highly relevant genes that either discriminate between phenotypic labels or are important with respect to the cellular process studied in the experiment: for example cell cycle or heat shock in yeast experiments, chemical or genetic perturbations of mammalian cell lines, and genes involved in class discovery for human tumors. In this paper we focus on the task of unsupervised gene selection. The problem of selecting a small subset of genes is particularly challenging as the datasets involved are typically characterized by a very small sample size ?? the order of few tens of tissue samples ??d by a very large feature space as the number of genes tend to be in the high thousands. We propose a model independent approach which scores candidate gene selections using spectral properties of the candidate affinity matrix. The algorithm is very straightforward to implement yet contains a number of remarkable properties which guarantee consistent sparse selections. To illustrate the value of our approach we applied our algorithm on five different datasets. The first consists of time course data from four well studied Hematopoietic cell lines (HL-60, Jurkat, NB4, and U937). The other four datasets include three well studied treatment outcomes (large cell lymphoma, childhood medulloblastomas, breast tumors) and one unpublished dataset (lymph status). We compared our approach both with other unsupervised methods (SOM,PCA,GS) and with supervised methods (SNR,RMB,RFE). The results clearly show that our approach considerably outperforms all the other unsupervised approaches in our study, is competitive with supervised methods and in some case even outperforms supervised approaches.
first_indexed	2024-09-23T10:50:15Z
id	mit-1721.1/7282
institution	Massachusetts Institute of Technology
language	en_US
last_indexed	2024-09-23T10:50:15Z
publishDate	2004
record_format	dspace
spelling	mit-1721.1/72822019-04-15T00:40:28Z Selecting Relevant Genes with a Spectral Approach Wolf, Lior Amnon Shashua, Mukherjee, Sayan AI Array technologies have made it possible to record simultaneously the expression pattern of thousands of genes. A fundamental problem in the analysis of gene expression data is the identification of highly relevant genes that either discriminate between phenotypic labels or are important with respect to the cellular process studied in the experiment: for example cell cycle or heat shock in yeast experiments, chemical or genetic perturbations of mammalian cell lines, and genes involved in class discovery for human tumors. In this paper we focus on the task of unsupervised gene selection. The problem of selecting a small subset of genes is particularly challenging as the datasets involved are typically characterized by a very small sample size ?? the order of few tens of tissue samples ??d by a very large feature space as the number of genes tend to be in the high thousands. We propose a model independent approach which scores candidate gene selections using spectral properties of the candidate affinity matrix. The algorithm is very straightforward to implement yet contains a number of remarkable properties which guarantee consistent sparse selections. To illustrate the value of our approach we applied our algorithm on five different datasets. The first consists of time course data from four well studied Hematopoietic cell lines (HL-60, Jurkat, NB4, and U937). The other four datasets include three well studied treatment outcomes (large cell lymphoma, childhood medulloblastomas, breast tumors) and one unpublished dataset (lymph status). We compared our approach both with other unsupervised methods (SOM,PCA,GS) and with supervised methods (SNR,RMB,RFE). The results clearly show that our approach considerably outperforms all the other unsupervised approaches in our study, is competitive with supervised methods and in some case even outperforms supervised approaches. 2004-10-20T21:05:21Z 2004-10-20T21:05:21Z 2004-01-27 AIM-2004-002 CBCL-234 http://hdl.handle.net/1721.1/7282 en_US AIM-2004-002 CBCL-234 2062939 bytes 836436 bytes application/postscript application/pdf application/postscript application/pdf
spellingShingle	AI Wolf, Lior Amnon Shashua, Mukherjee, Sayan Selecting Relevant Genes with a Spectral Approach
title	Selecting Relevant Genes with a Spectral Approach
title_full	Selecting Relevant Genes with a Spectral Approach
title_fullStr	Selecting Relevant Genes with a Spectral Approach
title_full_unstemmed	Selecting Relevant Genes with a Spectral Approach
title_short	Selecting Relevant Genes with a Spectral Approach
title_sort	selecting relevant genes with a spectral approach
topic	AI
url	http://hdl.handle.net/1721.1/7282
work_keys_str_mv	AT wolflior selectingrelevantgeneswithaspectralapproach AT amnonshashua selectingrelevantgeneswithaspectralapproach AT mukherjeesayan selectingrelevantgeneswithaspectralapproach

Selecting Relevant Genes with a Spectral Approach

Similar Items