Evaluating statistical analysis models for RNA sequencing experiments

Validating statistical analysis methods for RNA sequencing (RNA-seq) experiments is a complex task. Researcher often find themselves having to decide between competing models or assessing the reliability of results obtained with a designated analysis program. Computer simulation has been the most fr...

Full description

Bibliographic Details
Main Authors: Pablo eReeb, Juan eSteibel
Format: Article
Language:English
Published: Frontiers Media S.A. 2013-09-01
Series:Frontiers in Genetics
Subjects:
Online Access:http://journal.frontiersin.org/Journal/10.3389/fgene.2013.00178/full
_version_ 1811278392104845312
author Pablo eReeb
Juan eSteibel
author_facet Pablo eReeb
Juan eSteibel
author_sort Pablo eReeb
collection DOAJ
description Validating statistical analysis methods for RNA sequencing (RNA-seq) experiments is a complex task. Researcher often find themselves having to decide between competing models or assessing the reliability of results obtained with a designated analysis program. Computer simulation has been the most frequently used procedure to verify the adequacy of a model. However, datasets generated by simulations depend on the parameterization and the assumptions of the selected model. Moreover, such datasets may constitute a partial representation of reality as the complexity or RNA-seq data is hard to mimic. We present the use of plasmode datasets to complement the evaluation of statistical models for RNA-seq data. A plasmode is a dataset obtained from experimental data but for which come truth is known. Using a set of simulated scenarios of technical and biological replicates, and public available datasets, we illustrate how to design algorithms to construct plasmodes under different experimental conditions. We contrast results from two types of methods for RNA-seq: i) models based on negative binomial distribution (edgeR and DESeq), and ii) Gaussian models applied after transformation of data (MAANOVA). Results emphasize the fact that deciding what method to use may be experiment-specific due to the unknown distributions of expression levels. Plasmodes may contribute to choose which method to apply by using a similar pre-existing dataset. The promising results obtained from this approach, emphasize the need of promoting and improving systematic data sharing across the research community to facilitate plasmode building. Although we illustrate the use of plasmode for comparing differential expression analysis models, the flexibility of plasmode construction allows comparing upstream analysis, as normalization procedures or alignment pipelines, as well.
first_indexed 2024-04-13T00:34:56Z
format Article
id doaj.art-d735f9be0cc64778b33c6cdc4b9a8ae7
institution Directory Open Access Journal
issn 1664-8021
language English
last_indexed 2024-04-13T00:34:56Z
publishDate 2013-09-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Genetics
spelling doaj.art-d735f9be0cc64778b33c6cdc4b9a8ae72022-12-22T03:10:22ZengFrontiers Media S.A.Frontiers in Genetics1664-80212013-09-01410.3389/fgene.2013.0017860627Evaluating statistical analysis models for RNA sequencing experimentsPablo eReeb0Juan eSteibel1Michigan State UniversityMichigan State UniversityValidating statistical analysis methods for RNA sequencing (RNA-seq) experiments is a complex task. Researcher often find themselves having to decide between competing models or assessing the reliability of results obtained with a designated analysis program. Computer simulation has been the most frequently used procedure to verify the adequacy of a model. However, datasets generated by simulations depend on the parameterization and the assumptions of the selected model. Moreover, such datasets may constitute a partial representation of reality as the complexity or RNA-seq data is hard to mimic. We present the use of plasmode datasets to complement the evaluation of statistical models for RNA-seq data. A plasmode is a dataset obtained from experimental data but for which come truth is known. Using a set of simulated scenarios of technical and biological replicates, and public available datasets, we illustrate how to design algorithms to construct plasmodes under different experimental conditions. We contrast results from two types of methods for RNA-seq: i) models based on negative binomial distribution (edgeR and DESeq), and ii) Gaussian models applied after transformation of data (MAANOVA). Results emphasize the fact that deciding what method to use may be experiment-specific due to the unknown distributions of expression levels. Plasmodes may contribute to choose which method to apply by using a similar pre-existing dataset. The promising results obtained from this approach, emphasize the need of promoting and improving systematic data sharing across the research community to facilitate plasmode building. Although we illustrate the use of plasmode for comparing differential expression analysis models, the flexibility of plasmode construction allows comparing upstream analysis, as normalization procedures or alignment pipelines, as well.http://journal.frontiersin.org/Journal/10.3389/fgene.2013.00178/fullsimulationlinear modelsType I errorKeywords: RNA-seqplasmodes
spellingShingle Pablo eReeb
Juan eSteibel
Evaluating statistical analysis models for RNA sequencing experiments
Frontiers in Genetics
simulation
linear models
Type I error
Keywords: RNA-seq
plasmodes
title Evaluating statistical analysis models for RNA sequencing experiments
title_full Evaluating statistical analysis models for RNA sequencing experiments
title_fullStr Evaluating statistical analysis models for RNA sequencing experiments
title_full_unstemmed Evaluating statistical analysis models for RNA sequencing experiments
title_short Evaluating statistical analysis models for RNA sequencing experiments
title_sort evaluating statistical analysis models for rna sequencing experiments
topic simulation
linear models
Type I error
Keywords: RNA-seq
plasmodes
url http://journal.frontiersin.org/Journal/10.3389/fgene.2013.00178/full
work_keys_str_mv AT pabloereeb evaluatingstatisticalanalysismodelsforrnasequencingexperiments
AT juanesteibel evaluatingstatisticalanalysismodelsforrnasequencingexperiments