Extraction and analysis of signatures from the Gene Expression Omnibus by the crowd

Gene expression data are accumulating exponentially in public repositories. Reanalysis and integration of themed collections from these studies may provide new insights, but requires further human curation. Here we report a crowdsourcing project to annotate and reanalyse a large number of gene expre...

Full description

Bibliographic Details
Main Author: Szeto, Gregory
Other Authors: Massachusetts Institute of Technology. Department of Biological Engineering
Format: Article
Language:en_US
Published: Nature Publishing Group 2017
Online Access:http://hdl.handle.net/1721.1/107686
https://orcid.org/0000-0001-7604-1333
_version_ 1826206119557595136
author Szeto, Gregory
author2 Massachusetts Institute of Technology. Department of Biological Engineering
author_facet Massachusetts Institute of Technology. Department of Biological Engineering
Szeto, Gregory
author_sort Szeto, Gregory
collection MIT
description Gene expression data are accumulating exponentially in public repositories. Reanalysis and integration of themed collections from these studies may provide new insights, but requires further human curation. Here we report a crowdsourcing project to annotate and reanalyse a large number of gene expression profiles from Gene Expression Omnibus (GEO). Through a massive open online course on Coursera, over 70 participants from over 25 countries identify and annotate 2,460 single-gene perturbation signatures, 839 disease versus normal signatures, and 906 drug perturbation signatures. All these signatures are unique and are manually validated for quality. Global analysis of these signatures confirms known associations and identifies novel associations between genes, diseases and drugs. The manually curated signatures are used as a training set to develop classifiers for extracting similar signatures from the entire GEO repository. We develop a web portal to serve these signatures for query, download and visualization.
first_indexed 2024-09-23T13:24:24Z
format Article
id mit-1721.1/107686
institution Massachusetts Institute of Technology
language en_US
last_indexed 2024-09-23T13:24:24Z
publishDate 2017
publisher Nature Publishing Group
record_format dspace
spelling mit-1721.1/1076862022-10-01T15:04:08Z Extraction and analysis of signatures from the Gene Expression Omnibus by the crowd Szeto, Gregory Massachusetts Institute of Technology. Department of Biological Engineering Massachusetts Institute of Technology. Department of Materials Science and Engineering Ragon Institute of MGH, MIT and Harvard Koch Institute for Integrative Cancer Research at MIT Szeto, Gregory Gene expression data are accumulating exponentially in public repositories. Reanalysis and integration of themed collections from these studies may provide new insights, but requires further human curation. Here we report a crowdsourcing project to annotate and reanalyse a large number of gene expression profiles from Gene Expression Omnibus (GEO). Through a massive open online course on Coursera, over 70 participants from over 25 countries identify and annotate 2,460 single-gene perturbation signatures, 839 disease versus normal signatures, and 906 drug perturbation signatures. All these signatures are unique and are manually validated for quality. Global analysis of these signatures confirms known associations and identifies novel associations between genes, diseases and drugs. The manually curated signatures are used as a training set to develop classifiers for extracting similar signatures from the entire GEO repository. We develop a web portal to serve these signatures for query, download and visualization. 2017-03-24T14:11:49Z 2017-03-24T14:11:49Z 2016-09 2015-12 Article http://purl.org/eprint/type/JournalArticle 2041-1723 http://hdl.handle.net/1721.1/107686 Wang, Zichen et al. “Extraction and Analysis of Signatures from the Gene Expression Omnibus by the Crowd.” Nature Communications 7 (2016): 12846. https://orcid.org/0000-0001-7604-1333 en_US http://dx.doi.org/10.1038/ncomms12846 Nature Communications Creative Commons Attribution 4.0 International License http://creativecommons.org/licenses/by/4.0/ application/pdf Nature Publishing Group Nature
spellingShingle Szeto, Gregory
Extraction and analysis of signatures from the Gene Expression Omnibus by the crowd
title Extraction and analysis of signatures from the Gene Expression Omnibus by the crowd
title_full Extraction and analysis of signatures from the Gene Expression Omnibus by the crowd
title_fullStr Extraction and analysis of signatures from the Gene Expression Omnibus by the crowd
title_full_unstemmed Extraction and analysis of signatures from the Gene Expression Omnibus by the crowd
title_short Extraction and analysis of signatures from the Gene Expression Omnibus by the crowd
title_sort extraction and analysis of signatures from the gene expression omnibus by the crowd
url http://hdl.handle.net/1721.1/107686
https://orcid.org/0000-0001-7604-1333
work_keys_str_mv AT szetogregory extractionandanalysisofsignaturesfromthegeneexpressionomnibusbythecrowd