iDESC: identifying differential expression in single-cell RNA sequencing data with multiple subjects
Abstract Background Single-cell RNA sequencing (scRNA-seq) technology has enabled assessment of transcriptome-wide changes at single-cell resolution. Due to the heterogeneity in environmental exposure and genetic background across subjects, subject effect contributes to the major source of variation...
Main Authors: | , , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2023-08-01
|
Series: | BMC Bioinformatics |
Subjects: | |
Online Access: | https://doi.org/10.1186/s12859-023-05432-8 |
_version_ | 1797556107220090880 |
---|---|
author | Yunqing Liu Jiayi Zhao Taylor S. Adams Ningya Wang Jonas C. Schupp Weimiao Wu John E. McDonough Geoffrey L. Chupp Naftali Kaminski Zuoheng Wang Xiting Yan |
author_facet | Yunqing Liu Jiayi Zhao Taylor S. Adams Ningya Wang Jonas C. Schupp Weimiao Wu John E. McDonough Geoffrey L. Chupp Naftali Kaminski Zuoheng Wang Xiting Yan |
author_sort | Yunqing Liu |
collection | DOAJ |
description | Abstract Background Single-cell RNA sequencing (scRNA-seq) technology has enabled assessment of transcriptome-wide changes at single-cell resolution. Due to the heterogeneity in environmental exposure and genetic background across subjects, subject effect contributes to the major source of variation in scRNA-seq data with multiple subjects, which severely confounds cell type specific differential expression (DE) analysis. Moreover, dropout events are prevalent in scRNA-seq data, leading to excessive number of zeroes in the data, which further aggravates the challenge in DE analysis. Results We developed iDESC to detect cell type specific DE genes between two groups of subjects in scRNA-seq data. iDESC uses a zero-inflated negative binomial mixed model to consider both subject effect and dropouts. The prevalence of dropout events (dropout rate) was demonstrated to be dependent on gene expression level, which is modeled by pooling information across genes. Subject effect is modeled as a random effect in the log-mean of the negative binomial component. We evaluated and compared the performance of iDESC with eleven existing DE analysis methods. Using simulated data, we demonstrated that iDESC had well-controlled type I error and higher power compared to the existing methods. Applications of those methods with well-controlled type I error to three real scRNA-seq datasets from the same tissue and disease showed that the results of iDESC achieved the best consistency between datasets and the best disease relevance. Conclusions iDESC was able to achieve more accurate and robust DE analysis results by separating subject effect from disease effect with consideration of dropouts to identify DE genes, suggesting the importance of considering subject effect and dropouts in the DE analysis of scRNA-seq data with multiple subjects. |
first_indexed | 2024-03-10T16:57:04Z |
format | Article |
id | doaj.art-98bb000233a34cca8225fba23b3cf131 |
institution | Directory Open Access Journal |
issn | 1471-2105 |
language | English |
last_indexed | 2024-03-10T16:57:04Z |
publishDate | 2023-08-01 |
publisher | BMC |
record_format | Article |
series | BMC Bioinformatics |
spelling | doaj.art-98bb000233a34cca8225fba23b3cf1312023-11-20T11:06:39ZengBMCBMC Bioinformatics1471-21052023-08-0124112010.1186/s12859-023-05432-8iDESC: identifying differential expression in single-cell RNA sequencing data with multiple subjectsYunqing Liu0Jiayi Zhao1Taylor S. Adams2Ningya Wang3Jonas C. Schupp4Weimiao Wu5John E. McDonough6Geoffrey L. Chupp7Naftali Kaminski8Zuoheng Wang9Xiting Yan10Department of Biostatistics, Yale School of Public HealthDepartment of Biostatistics, Yale School of Public HealthSection of Pulmonary, Critical Care and Sleep Medicine, Yale School of MedicineDepartment of Biostatistics, Yale School of Public HealthSection of Pulmonary, Critical Care and Sleep Medicine, Yale School of MedicineDepartment of Biostatistics, Yale School of Public HealthSection of Pulmonary, Critical Care and Sleep Medicine, Yale School of MedicineSection of Pulmonary, Critical Care and Sleep Medicine, Yale School of MedicineSection of Pulmonary, Critical Care and Sleep Medicine, Yale School of MedicineDepartment of Biostatistics, Yale School of Public HealthDepartment of Biostatistics, Yale School of Public HealthAbstract Background Single-cell RNA sequencing (scRNA-seq) technology has enabled assessment of transcriptome-wide changes at single-cell resolution. Due to the heterogeneity in environmental exposure and genetic background across subjects, subject effect contributes to the major source of variation in scRNA-seq data with multiple subjects, which severely confounds cell type specific differential expression (DE) analysis. Moreover, dropout events are prevalent in scRNA-seq data, leading to excessive number of zeroes in the data, which further aggravates the challenge in DE analysis. Results We developed iDESC to detect cell type specific DE genes between two groups of subjects in scRNA-seq data. iDESC uses a zero-inflated negative binomial mixed model to consider both subject effect and dropouts. The prevalence of dropout events (dropout rate) was demonstrated to be dependent on gene expression level, which is modeled by pooling information across genes. Subject effect is modeled as a random effect in the log-mean of the negative binomial component. We evaluated and compared the performance of iDESC with eleven existing DE analysis methods. Using simulated data, we demonstrated that iDESC had well-controlled type I error and higher power compared to the existing methods. Applications of those methods with well-controlled type I error to three real scRNA-seq datasets from the same tissue and disease showed that the results of iDESC achieved the best consistency between datasets and the best disease relevance. Conclusions iDESC was able to achieve more accurate and robust DE analysis results by separating subject effect from disease effect with consideration of dropouts to identify DE genes, suggesting the importance of considering subject effect and dropouts in the DE analysis of scRNA-seq data with multiple subjects.https://doi.org/10.1186/s12859-023-05432-8Single-cell RNA sequencingDifferential expression analysisSubject effectZero-inflated negative binomial mixed model |
spellingShingle | Yunqing Liu Jiayi Zhao Taylor S. Adams Ningya Wang Jonas C. Schupp Weimiao Wu John E. McDonough Geoffrey L. Chupp Naftali Kaminski Zuoheng Wang Xiting Yan iDESC: identifying differential expression in single-cell RNA sequencing data with multiple subjects BMC Bioinformatics Single-cell RNA sequencing Differential expression analysis Subject effect Zero-inflated negative binomial mixed model |
title | iDESC: identifying differential expression in single-cell RNA sequencing data with multiple subjects |
title_full | iDESC: identifying differential expression in single-cell RNA sequencing data with multiple subjects |
title_fullStr | iDESC: identifying differential expression in single-cell RNA sequencing data with multiple subjects |
title_full_unstemmed | iDESC: identifying differential expression in single-cell RNA sequencing data with multiple subjects |
title_short | iDESC: identifying differential expression in single-cell RNA sequencing data with multiple subjects |
title_sort | idesc identifying differential expression in single cell rna sequencing data with multiple subjects |
topic | Single-cell RNA sequencing Differential expression analysis Subject effect Zero-inflated negative binomial mixed model |
url | https://doi.org/10.1186/s12859-023-05432-8 |
work_keys_str_mv | AT yunqingliu idescidentifyingdifferentialexpressioninsinglecellrnasequencingdatawithmultiplesubjects AT jiayizhao idescidentifyingdifferentialexpressioninsinglecellrnasequencingdatawithmultiplesubjects AT taylorsadams idescidentifyingdifferentialexpressioninsinglecellrnasequencingdatawithmultiplesubjects AT ningyawang idescidentifyingdifferentialexpressioninsinglecellrnasequencingdatawithmultiplesubjects AT jonascschupp idescidentifyingdifferentialexpressioninsinglecellrnasequencingdatawithmultiplesubjects AT weimiaowu idescidentifyingdifferentialexpressioninsinglecellrnasequencingdatawithmultiplesubjects AT johnemcdonough idescidentifyingdifferentialexpressioninsinglecellrnasequencingdatawithmultiplesubjects AT geoffreylchupp idescidentifyingdifferentialexpressioninsinglecellrnasequencingdatawithmultiplesubjects AT naftalikaminski idescidentifyingdifferentialexpressioninsinglecellrnasequencingdatawithmultiplesubjects AT zuohengwang idescidentifyingdifferentialexpressioninsinglecellrnasequencingdatawithmultiplesubjects AT xitingyan idescidentifyingdifferentialexpressioninsinglecellrnasequencingdatawithmultiplesubjects |