Evaluating statistical analysis models for RNA sequencing experiments

Validating statistical analysis methods for RNA sequencing (RNA-seq) experiments is a complex task. Researcher often find themselves having to decide between competing models or assessing the reliability of results obtained with a designated analysis program. Computer simulation has been the most fr...

Full description

Bibliographic Details
Main Authors:	Pablo eReeb, Juan eSteibel
Format:	Article
Language:	English
Published:	Frontiers Media S.A. 2013-09-01
Series:	Frontiers in Genetics
Subjects:	simulation linear models Type I error Keywords: RNA-seq plasmodes
Online Access:	http://journal.frontiersin.org/Journal/10.3389/fgene.2013.00178/full

_version_	1811278392104845312
author	Pablo eReeb Juan eSteibel
author_facet	Pablo eReeb Juan eSteibel
author_sort	Pablo eReeb
collection	DOAJ
description	Validating statistical analysis methods for RNA sequencing (RNA-seq) experiments is a complex task. Researcher often find themselves having to decide between competing models or assessing the reliability of results obtained with a designated analysis program. Computer simulation has been the most frequently used procedure to verify the adequacy of a model. However, datasets generated by simulations depend on the parameterization and the assumptions of the selected model. Moreover, such datasets may constitute a partial representation of reality as the complexity or RNA-seq data is hard to mimic. We present the use of plasmode datasets to complement the evaluation of statistical models for RNA-seq data. A plasmode is a dataset obtained from experimental data but for which come truth is known. Using a set of simulated scenarios of technical and biological replicates, and public available datasets, we illustrate how to design algorithms to construct plasmodes under different experimental conditions. We contrast results from two types of methods for RNA-seq: i) models based on negative binomial distribution (edgeR and DESeq), and ii) Gaussian models applied after transformation of data (MAANOVA). Results emphasize the fact that deciding what method to use may be experiment-specific due to the unknown distributions of expression levels. Plasmodes may contribute to choose which method to apply by using a similar pre-existing dataset. The promising results obtained from this approach, emphasize the need of promoting and improving systematic data sharing across the research community to facilitate plasmode building. Although we illustrate the use of plasmode for comparing differential expression analysis models, the flexibility of plasmode construction allows comparing upstream analysis, as normalization procedures or alignment pipelines, as well.
first_indexed	2024-04-13T00:34:56Z
format	Article
id	doaj.art-d735f9be0cc64778b33c6cdc4b9a8ae7
institution	Directory Open Access Journal
issn	1664-8021
language	English
last_indexed	2024-04-13T00:34:56Z
publishDate	2013-09-01
publisher	Frontiers Media S.A.
record_format	Article
series	Frontiers in Genetics
spelling	doaj.art-d735f9be0cc64778b33c6cdc4b9a8ae72022-12-22T03:10:22ZengFrontiers Media S.A.Frontiers in Genetics1664-80212013-09-01410.3389/fgene.2013.0017860627Evaluating statistical analysis models for RNA sequencing experimentsPablo eReeb0Juan eSteibel1Michigan State UniversityMichigan State UniversityValidating statistical analysis methods for RNA sequencing (RNA-seq) experiments is a complex task. Researcher often find themselves having to decide between competing models or assessing the reliability of results obtained with a designated analysis program. Computer simulation has been the most frequently used procedure to verify the adequacy of a model. However, datasets generated by simulations depend on the parameterization and the assumptions of the selected model. Moreover, such datasets may constitute a partial representation of reality as the complexity or RNA-seq data is hard to mimic. We present the use of plasmode datasets to complement the evaluation of statistical models for RNA-seq data. A plasmode is a dataset obtained from experimental data but for which come truth is known. Using a set of simulated scenarios of technical and biological replicates, and public available datasets, we illustrate how to design algorithms to construct plasmodes under different experimental conditions. We contrast results from two types of methods for RNA-seq: i) models based on negative binomial distribution (edgeR and DESeq), and ii) Gaussian models applied after transformation of data (MAANOVA). Results emphasize the fact that deciding what method to use may be experiment-specific due to the unknown distributions of expression levels. Plasmodes may contribute to choose which method to apply by using a similar pre-existing dataset. The promising results obtained from this approach, emphasize the need of promoting and improving systematic data sharing across the research community to facilitate plasmode building. Although we illustrate the use of plasmode for comparing differential expression analysis models, the flexibility of plasmode construction allows comparing upstream analysis, as normalization procedures or alignment pipelines, as well.http://journal.frontiersin.org/Journal/10.3389/fgene.2013.00178/fullsimulationlinear modelsType I errorKeywords: RNA-seqplasmodes
spellingShingle	Pablo eReeb Juan eSteibel Evaluating statistical analysis models for RNA sequencing experiments Frontiers in Genetics simulation linear models Type I error Keywords: RNA-seq plasmodes
title	Evaluating statistical analysis models for RNA sequencing experiments
title_full	Evaluating statistical analysis models for RNA sequencing experiments
title_fullStr	Evaluating statistical analysis models for RNA sequencing experiments
title_full_unstemmed	Evaluating statistical analysis models for RNA sequencing experiments
title_short	Evaluating statistical analysis models for RNA sequencing experiments
title_sort	evaluating statistical analysis models for rna sequencing experiments
topic	simulation linear models Type I error Keywords: RNA-seq plasmodes
url	http://journal.frontiersin.org/Journal/10.3389/fgene.2013.00178/full
work_keys_str_mv	AT pabloereeb evaluatingstatisticalanalysismodelsforrnasequencingexperiments AT juanesteibel evaluatingstatisticalanalysismodelsforrnasequencingexperiments

Evaluating statistical analysis models for RNA sequencing experiments

Similar Items