Identification of Cancer Related Genes Using a Comprehensive Map of Human Gene Expression

Rapid accumulation and availability of gene expression datasets in public repositories have enabled large-scale meta-analyses of combined data. The richness of cross-experiment data has provided new biological insights, including identification of new cancer genes. In this study, we compiled a human...

Full description

Bibliographic Details
Main Authors: Torrente, Aurora, Lukk, Margus, Xue, Vincent, Parkinson, Helen, Rung, Johan, Brazma, Alvis
Other Authors: Massachusetts Institute of Technology. Computational and Systems Biology Program
Format: Article
Language:en_US
Published: Public Library of Science 2017
Online Access:http://hdl.handle.net/1721.1/107889
https://orcid.org/0000-0003-1199-7689
_version_ 1826204605091938304
author Torrente, Aurora
Lukk, Margus
Xue, Vincent
Parkinson, Helen
Rung, Johan
Brazma, Alvis
author2 Massachusetts Institute of Technology. Computational and Systems Biology Program
author_facet Massachusetts Institute of Technology. Computational and Systems Biology Program
Torrente, Aurora
Lukk, Margus
Xue, Vincent
Parkinson, Helen
Rung, Johan
Brazma, Alvis
author_sort Torrente, Aurora
collection MIT
description Rapid accumulation and availability of gene expression datasets in public repositories have enabled large-scale meta-analyses of combined data. The richness of cross-experiment data has provided new biological insights, including identification of new cancer genes. In this study, we compiled a human gene expression dataset from ∼40,000 publicly available Affymetrix HG-U133Plus2 arrays. After strict quality control and data normalisation the data was quantified in an expression matrix of ∼20,000 genes and ∼28,000 samples. To enable different ways of sample grouping, existing annotations where subjected to systematic ontology assisted categorisation and manual curation. Groups like normal tissues, neoplasmic tissues, cell lines, homoeotic cells and incompletely differentiated cells were created. Unsupervised analysis of the data confirmed global structure of expression consistent with earlier analysis but with more details revealed due to increased resolution. A suitable mixed-effects linear model was used to further investigate gene expression in solid tissue tumours, and to compare these with the respective healthy solid tissues. The analysis identified 1,285 genes with systematic expression change in cancer. The list is significantly enriched with known cancer genes from large, public, peer-reviewed databases, whereas the remaining ones are proposed as new cancer gene candidates. The compiled dataset is publicly available in the ArrayExpress Archive. It contains the most diverse collection of biological samples, making it the largest systematically annotated gene expression dataset of its kind in the public domain
first_indexed 2024-09-23T12:58:06Z
format Article
id mit-1721.1/107889
institution Massachusetts Institute of Technology
language en_US
last_indexed 2024-09-23T12:58:06Z
publishDate 2017
publisher Public Library of Science
record_format dspace
spelling mit-1721.1/1078892022-09-28T11:13:07Z Identification of Cancer Related Genes Using a Comprehensive Map of Human Gene Expression Torrente, Aurora Lukk, Margus Xue, Vincent Parkinson, Helen Rung, Johan Brazma, Alvis Massachusetts Institute of Technology. Computational and Systems Biology Program Xue, Vincent Rapid accumulation and availability of gene expression datasets in public repositories have enabled large-scale meta-analyses of combined data. The richness of cross-experiment data has provided new biological insights, including identification of new cancer genes. In this study, we compiled a human gene expression dataset from ∼40,000 publicly available Affymetrix HG-U133Plus2 arrays. After strict quality control and data normalisation the data was quantified in an expression matrix of ∼20,000 genes and ∼28,000 samples. To enable different ways of sample grouping, existing annotations where subjected to systematic ontology assisted categorisation and manual curation. Groups like normal tissues, neoplasmic tissues, cell lines, homoeotic cells and incompletely differentiated cells were created. Unsupervised analysis of the data confirmed global structure of expression consistent with earlier analysis but with more details revealed due to increased resolution. A suitable mixed-effects linear model was used to further investigate gene expression in solid tissue tumours, and to compare these with the respective healthy solid tissues. The analysis identified 1,285 genes with systematic expression change in cancer. The list is significantly enriched with known cancer genes from large, public, peer-reviewed databases, whereas the remaining ones are proposed as new cancer gene candidates. The compiled dataset is publicly available in the ArrayExpress Archive. It contains the most diverse collection of biological samples, making it the largest systematically annotated gene expression dataset of its kind in the public domain 2017-04-05T20:44:22Z 2017-04-05T20:44:22Z 2016-06 2015-12 Article http://purl.org/eprint/type/JournalArticle 1932-6203 http://hdl.handle.net/1721.1/107889 Torrente, Aurora et al. “Identification of Cancer Related Genes Using a Comprehensive Map of Human Gene Expression.” Ed. Paolo Provero. PLOS ONE 11.6 (2016): e0157484. https://orcid.org/0000-0003-1199-7689 en_US http://dx.doi.org/10.1371/journal.pone.0157484 PLOS ONE Creative Commons Attribution 4.0 International License http://creativecommons.org/licenses/by/4.0/ application/pdf Public Library of Science PLOS
spellingShingle Torrente, Aurora
Lukk, Margus
Xue, Vincent
Parkinson, Helen
Rung, Johan
Brazma, Alvis
Identification of Cancer Related Genes Using a Comprehensive Map of Human Gene Expression
title Identification of Cancer Related Genes Using a Comprehensive Map of Human Gene Expression
title_full Identification of Cancer Related Genes Using a Comprehensive Map of Human Gene Expression
title_fullStr Identification of Cancer Related Genes Using a Comprehensive Map of Human Gene Expression
title_full_unstemmed Identification of Cancer Related Genes Using a Comprehensive Map of Human Gene Expression
title_short Identification of Cancer Related Genes Using a Comprehensive Map of Human Gene Expression
title_sort identification of cancer related genes using a comprehensive map of human gene expression
url http://hdl.handle.net/1721.1/107889
https://orcid.org/0000-0003-1199-7689
work_keys_str_mv AT torrenteaurora identificationofcancerrelatedgenesusingacomprehensivemapofhumangeneexpression
AT lukkmargus identificationofcancerrelatedgenesusingacomprehensivemapofhumangeneexpression
AT xuevincent identificationofcancerrelatedgenesusingacomprehensivemapofhumangeneexpression
AT parkinsonhelen identificationofcancerrelatedgenesusingacomprehensivemapofhumangeneexpression
AT rungjohan identificationofcancerrelatedgenesusingacomprehensivemapofhumangeneexpression
AT brazmaalvis identificationofcancerrelatedgenesusingacomprehensivemapofhumangeneexpression