An evaluation of RNA-seq differential analysis methods.

RNA-seq is a high-throughput sequencing technology widely used for gene transcript discovery and quantification under different biological or biomedical conditions. A fundamental research question in most RNA-seq experiments is the identification of differentially expressed genes among experimental...

Full description

Bibliographic Details
Main Authors: Dongmei Li, Martin S Zand, Timothy D Dye, Maciej L Goniewicz, Irfan Rahman, Zidian Xie
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2022-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0264246
_version_ 1818058644083703808
author Dongmei Li
Martin S Zand
Timothy D Dye
Maciej L Goniewicz
Irfan Rahman
Zidian Xie
author_facet Dongmei Li
Martin S Zand
Timothy D Dye
Maciej L Goniewicz
Irfan Rahman
Zidian Xie
author_sort Dongmei Li
collection DOAJ
description RNA-seq is a high-throughput sequencing technology widely used for gene transcript discovery and quantification under different biological or biomedical conditions. A fundamental research question in most RNA-seq experiments is the identification of differentially expressed genes among experimental conditions or sample groups. Numerous statistical methods for RNA-seq differential analysis have been proposed since the emergence of the RNA-seq assay. To evaluate popular differential analysis methods used in the open source R and Bioconductor packages, we conducted multiple simulation studies to compare the performance of eight RNA-seq differential analysis methods used in RNA-seq data analysis (edgeR, DESeq, DESeq2, baySeq, EBSeq, NOISeq, SAMSeq, Voom). The comparisons were across different scenarios with either equal or unequal library sizes, different distribution assumptions and sample sizes. We measured performance using false discovery rate (FDR) control, power, and stability. No significant differences were observed for FDR control, power, or stability across methods, whether with equal or unequal library sizes. For RNA-seq count data with negative binomial distribution, when sample size is 3 in each group, EBSeq performed better than the other methods as indicated by FDR control, power, and stability. When sample sizes increase to 6 or 12 in each group, DESeq2 performed slightly better than other methods. All methods have improved performance when sample size increases to 12 in each group except DESeq. For RNA-seq count data with log-normal distribution, both DESeq and DESeq2 methods performed better than other methods in terms of FDR control, power, and stability across all sample sizes. Real RNA-seq experimental data were also used to compare the total number of discoveries and stability of discoveries for each method. For RNA-seq data analysis, the EBSeq method is recommended for studies with sample size as small as 3 in each group, and the DESeq2 method is recommended for sample size of 6 or higher in each group when the data follow the negative binomial distribution. Both DESeq and DESeq2 methods are recommended when the data follow the log-normal distribution.
first_indexed 2024-12-10T13:03:54Z
format Article
id doaj.art-bcc519671ea149ac87e084540998c40b
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-12-10T13:03:54Z
publishDate 2022-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-bcc519671ea149ac87e084540998c40b2022-12-22T01:47:53ZengPublic Library of Science (PLoS)PLoS ONE1932-62032022-01-01179e026424610.1371/journal.pone.0264246An evaluation of RNA-seq differential analysis methods.Dongmei LiMartin S ZandTimothy D DyeMaciej L GoniewiczIrfan RahmanZidian XieRNA-seq is a high-throughput sequencing technology widely used for gene transcript discovery and quantification under different biological or biomedical conditions. A fundamental research question in most RNA-seq experiments is the identification of differentially expressed genes among experimental conditions or sample groups. Numerous statistical methods for RNA-seq differential analysis have been proposed since the emergence of the RNA-seq assay. To evaluate popular differential analysis methods used in the open source R and Bioconductor packages, we conducted multiple simulation studies to compare the performance of eight RNA-seq differential analysis methods used in RNA-seq data analysis (edgeR, DESeq, DESeq2, baySeq, EBSeq, NOISeq, SAMSeq, Voom). The comparisons were across different scenarios with either equal or unequal library sizes, different distribution assumptions and sample sizes. We measured performance using false discovery rate (FDR) control, power, and stability. No significant differences were observed for FDR control, power, or stability across methods, whether with equal or unequal library sizes. For RNA-seq count data with negative binomial distribution, when sample size is 3 in each group, EBSeq performed better than the other methods as indicated by FDR control, power, and stability. When sample sizes increase to 6 or 12 in each group, DESeq2 performed slightly better than other methods. All methods have improved performance when sample size increases to 12 in each group except DESeq. For RNA-seq count data with log-normal distribution, both DESeq and DESeq2 methods performed better than other methods in terms of FDR control, power, and stability across all sample sizes. Real RNA-seq experimental data were also used to compare the total number of discoveries and stability of discoveries for each method. For RNA-seq data analysis, the EBSeq method is recommended for studies with sample size as small as 3 in each group, and the DESeq2 method is recommended for sample size of 6 or higher in each group when the data follow the negative binomial distribution. Both DESeq and DESeq2 methods are recommended when the data follow the log-normal distribution.https://doi.org/10.1371/journal.pone.0264246
spellingShingle Dongmei Li
Martin S Zand
Timothy D Dye
Maciej L Goniewicz
Irfan Rahman
Zidian Xie
An evaluation of RNA-seq differential analysis methods.
PLoS ONE
title An evaluation of RNA-seq differential analysis methods.
title_full An evaluation of RNA-seq differential analysis methods.
title_fullStr An evaluation of RNA-seq differential analysis methods.
title_full_unstemmed An evaluation of RNA-seq differential analysis methods.
title_short An evaluation of RNA-seq differential analysis methods.
title_sort evaluation of rna seq differential analysis methods
url https://doi.org/10.1371/journal.pone.0264246
work_keys_str_mv AT dongmeili anevaluationofrnaseqdifferentialanalysismethods
AT martinszand anevaluationofrnaseqdifferentialanalysismethods
AT timothyddye anevaluationofrnaseqdifferentialanalysismethods
AT maciejlgoniewicz anevaluationofrnaseqdifferentialanalysismethods
AT irfanrahman anevaluationofrnaseqdifferentialanalysismethods
AT zidianxie anevaluationofrnaseqdifferentialanalysismethods
AT dongmeili evaluationofrnaseqdifferentialanalysismethods
AT martinszand evaluationofrnaseqdifferentialanalysismethods
AT timothyddye evaluationofrnaseqdifferentialanalysismethods
AT maciejlgoniewicz evaluationofrnaseqdifferentialanalysismethods
AT irfanrahman evaluationofrnaseqdifferentialanalysismethods
AT zidianxie evaluationofrnaseqdifferentialanalysismethods