An evaluation of RNA-seq differential analysis methods.
RNA-seq is a high-throughput sequencing technology widely used for gene transcript discovery and quantification under different biological or biomedical conditions. A fundamental research question in most RNA-seq experiments is the identification of differentially expressed genes among experimental...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2022-01-01
|
Series: | PLoS ONE |
Online Access: | https://doi.org/10.1371/journal.pone.0264246 |
_version_ | 1818058644083703808 |
---|---|
author | Dongmei Li Martin S Zand Timothy D Dye Maciej L Goniewicz Irfan Rahman Zidian Xie |
author_facet | Dongmei Li Martin S Zand Timothy D Dye Maciej L Goniewicz Irfan Rahman Zidian Xie |
author_sort | Dongmei Li |
collection | DOAJ |
description | RNA-seq is a high-throughput sequencing technology widely used for gene transcript discovery and quantification under different biological or biomedical conditions. A fundamental research question in most RNA-seq experiments is the identification of differentially expressed genes among experimental conditions or sample groups. Numerous statistical methods for RNA-seq differential analysis have been proposed since the emergence of the RNA-seq assay. To evaluate popular differential analysis methods used in the open source R and Bioconductor packages, we conducted multiple simulation studies to compare the performance of eight RNA-seq differential analysis methods used in RNA-seq data analysis (edgeR, DESeq, DESeq2, baySeq, EBSeq, NOISeq, SAMSeq, Voom). The comparisons were across different scenarios with either equal or unequal library sizes, different distribution assumptions and sample sizes. We measured performance using false discovery rate (FDR) control, power, and stability. No significant differences were observed for FDR control, power, or stability across methods, whether with equal or unequal library sizes. For RNA-seq count data with negative binomial distribution, when sample size is 3 in each group, EBSeq performed better than the other methods as indicated by FDR control, power, and stability. When sample sizes increase to 6 or 12 in each group, DESeq2 performed slightly better than other methods. All methods have improved performance when sample size increases to 12 in each group except DESeq. For RNA-seq count data with log-normal distribution, both DESeq and DESeq2 methods performed better than other methods in terms of FDR control, power, and stability across all sample sizes. Real RNA-seq experimental data were also used to compare the total number of discoveries and stability of discoveries for each method. For RNA-seq data analysis, the EBSeq method is recommended for studies with sample size as small as 3 in each group, and the DESeq2 method is recommended for sample size of 6 or higher in each group when the data follow the negative binomial distribution. Both DESeq and DESeq2 methods are recommended when the data follow the log-normal distribution. |
first_indexed | 2024-12-10T13:03:54Z |
format | Article |
id | doaj.art-bcc519671ea149ac87e084540998c40b |
institution | Directory Open Access Journal |
issn | 1932-6203 |
language | English |
last_indexed | 2024-12-10T13:03:54Z |
publishDate | 2022-01-01 |
publisher | Public Library of Science (PLoS) |
record_format | Article |
series | PLoS ONE |
spelling | doaj.art-bcc519671ea149ac87e084540998c40b2022-12-22T01:47:53ZengPublic Library of Science (PLoS)PLoS ONE1932-62032022-01-01179e026424610.1371/journal.pone.0264246An evaluation of RNA-seq differential analysis methods.Dongmei LiMartin S ZandTimothy D DyeMaciej L GoniewiczIrfan RahmanZidian XieRNA-seq is a high-throughput sequencing technology widely used for gene transcript discovery and quantification under different biological or biomedical conditions. A fundamental research question in most RNA-seq experiments is the identification of differentially expressed genes among experimental conditions or sample groups. Numerous statistical methods for RNA-seq differential analysis have been proposed since the emergence of the RNA-seq assay. To evaluate popular differential analysis methods used in the open source R and Bioconductor packages, we conducted multiple simulation studies to compare the performance of eight RNA-seq differential analysis methods used in RNA-seq data analysis (edgeR, DESeq, DESeq2, baySeq, EBSeq, NOISeq, SAMSeq, Voom). The comparisons were across different scenarios with either equal or unequal library sizes, different distribution assumptions and sample sizes. We measured performance using false discovery rate (FDR) control, power, and stability. No significant differences were observed for FDR control, power, or stability across methods, whether with equal or unequal library sizes. For RNA-seq count data with negative binomial distribution, when sample size is 3 in each group, EBSeq performed better than the other methods as indicated by FDR control, power, and stability. When sample sizes increase to 6 or 12 in each group, DESeq2 performed slightly better than other methods. All methods have improved performance when sample size increases to 12 in each group except DESeq. For RNA-seq count data with log-normal distribution, both DESeq and DESeq2 methods performed better than other methods in terms of FDR control, power, and stability across all sample sizes. Real RNA-seq experimental data were also used to compare the total number of discoveries and stability of discoveries for each method. For RNA-seq data analysis, the EBSeq method is recommended for studies with sample size as small as 3 in each group, and the DESeq2 method is recommended for sample size of 6 or higher in each group when the data follow the negative binomial distribution. Both DESeq and DESeq2 methods are recommended when the data follow the log-normal distribution.https://doi.org/10.1371/journal.pone.0264246 |
spellingShingle | Dongmei Li Martin S Zand Timothy D Dye Maciej L Goniewicz Irfan Rahman Zidian Xie An evaluation of RNA-seq differential analysis methods. PLoS ONE |
title | An evaluation of RNA-seq differential analysis methods. |
title_full | An evaluation of RNA-seq differential analysis methods. |
title_fullStr | An evaluation of RNA-seq differential analysis methods. |
title_full_unstemmed | An evaluation of RNA-seq differential analysis methods. |
title_short | An evaluation of RNA-seq differential analysis methods. |
title_sort | evaluation of rna seq differential analysis methods |
url | https://doi.org/10.1371/journal.pone.0264246 |
work_keys_str_mv | AT dongmeili anevaluationofrnaseqdifferentialanalysismethods AT martinszand anevaluationofrnaseqdifferentialanalysismethods AT timothyddye anevaluationofrnaseqdifferentialanalysismethods AT maciejlgoniewicz anevaluationofrnaseqdifferentialanalysismethods AT irfanrahman anevaluationofrnaseqdifferentialanalysismethods AT zidianxie anevaluationofrnaseqdifferentialanalysismethods AT dongmeili evaluationofrnaseqdifferentialanalysismethods AT martinszand evaluationofrnaseqdifferentialanalysismethods AT timothyddye evaluationofrnaseqdifferentialanalysismethods AT maciejlgoniewicz evaluationofrnaseqdifferentialanalysismethods AT irfanrahman evaluationofrnaseqdifferentialanalysismethods AT zidianxie evaluationofrnaseqdifferentialanalysismethods |