Global meta-analysis of transcriptomics studies.

Transcriptomics meta-analysis aims at re-using existing data to derive novel biological hypotheses, and is motivated by the public availability of a large number of independent studies. Current methods are based on breaking down studies into multiple comparisons between phenotypes (e.g. disease vs....

Full description

Bibliographic Details
Main Authors: José Caldas, Susana Vinga
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2014-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC3935861?pdf=render
_version_ 1818424688198549504
author José Caldas
Susana Vinga
author_facet José Caldas
Susana Vinga
author_sort José Caldas
collection DOAJ
description Transcriptomics meta-analysis aims at re-using existing data to derive novel biological hypotheses, and is motivated by the public availability of a large number of independent studies. Current methods are based on breaking down studies into multiple comparisons between phenotypes (e.g. disease vs. healthy), based on the studies' experimental designs, followed by computing the overlap between the resulting differential expression signatures. While useful, in this methodology each study yields multiple independent phenotype comparisons, and connections are established not between studies, but rather between subsets of the studies corresponding to phenotype comparisons. We propose a rank-based statistical meta-analysis framework that establishes global connections between transcriptomics studies without breaking down studies into sets of phenotype comparisons. By using a rank product method, our framework extracts global features from each study, corresponding to genes that are consistently among the most expressed or differentially expressed genes in that study. Those features are then statistically modelled via a term-frequency inverse-document frequency (TF-IDF) model, which is then used for connecting studies. Our framework is fast and parameter-free; when applied to large collections of Homo sapiens and Streptococcus pneumoniae transcriptomics studies, it performs better than similarity-based approaches in retrieving related studies, using a Medical Subject Headings gold standard. Finally, we highlight via case studies how the framework can be used to derive novel biological hypotheses regarding related studies and the genes that drive those connections. Our proposed statistical framework shows that it is possible to perform a meta-analysis of transcriptomics studies with arbitrary experimental designs by deriving global expression features rather than decomposing studies into multiple phenotype comparisons.
first_indexed 2024-12-14T14:02:01Z
format Article
id doaj.art-5d6c66d92d804e738f075efb681f5e3f
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-12-14T14:02:01Z
publishDate 2014-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-5d6c66d92d804e738f075efb681f5e3f2022-12-21T22:58:42ZengPublic Library of Science (PLoS)PLoS ONE1932-62032014-01-0192e8931810.1371/journal.pone.0089318Global meta-analysis of transcriptomics studies.José CaldasSusana VingaTranscriptomics meta-analysis aims at re-using existing data to derive novel biological hypotheses, and is motivated by the public availability of a large number of independent studies. Current methods are based on breaking down studies into multiple comparisons between phenotypes (e.g. disease vs. healthy), based on the studies' experimental designs, followed by computing the overlap between the resulting differential expression signatures. While useful, in this methodology each study yields multiple independent phenotype comparisons, and connections are established not between studies, but rather between subsets of the studies corresponding to phenotype comparisons. We propose a rank-based statistical meta-analysis framework that establishes global connections between transcriptomics studies without breaking down studies into sets of phenotype comparisons. By using a rank product method, our framework extracts global features from each study, corresponding to genes that are consistently among the most expressed or differentially expressed genes in that study. Those features are then statistically modelled via a term-frequency inverse-document frequency (TF-IDF) model, which is then used for connecting studies. Our framework is fast and parameter-free; when applied to large collections of Homo sapiens and Streptococcus pneumoniae transcriptomics studies, it performs better than similarity-based approaches in retrieving related studies, using a Medical Subject Headings gold standard. Finally, we highlight via case studies how the framework can be used to derive novel biological hypotheses regarding related studies and the genes that drive those connections. Our proposed statistical framework shows that it is possible to perform a meta-analysis of transcriptomics studies with arbitrary experimental designs by deriving global expression features rather than decomposing studies into multiple phenotype comparisons.http://europepmc.org/articles/PMC3935861?pdf=render
spellingShingle José Caldas
Susana Vinga
Global meta-analysis of transcriptomics studies.
PLoS ONE
title Global meta-analysis of transcriptomics studies.
title_full Global meta-analysis of transcriptomics studies.
title_fullStr Global meta-analysis of transcriptomics studies.
title_full_unstemmed Global meta-analysis of transcriptomics studies.
title_short Global meta-analysis of transcriptomics studies.
title_sort global meta analysis of transcriptomics studies
url http://europepmc.org/articles/PMC3935861?pdf=render
work_keys_str_mv AT josecaldas globalmetaanalysisoftranscriptomicsstudies
AT susanavinga globalmetaanalysisoftranscriptomicsstudies