iDESC: identifying differential expression in single-cell RNA sequencing data with multiple subjects

Abstract Background Single-cell RNA sequencing (scRNA-seq) technology has enabled assessment of transcriptome-wide changes at single-cell resolution. Due to the heterogeneity in environmental exposure and genetic background across subjects, subject effect contributes to the major source of variation...

Full description

Bibliographic Details
Main Authors: Yunqing Liu, Jiayi Zhao, Taylor S. Adams, Ningya Wang, Jonas C. Schupp, Weimiao Wu, John E. McDonough, Geoffrey L. Chupp, Naftali Kaminski, Zuoheng Wang, Xiting Yan
Format: Article
Language:English
Published: BMC 2023-08-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-023-05432-8
_version_ 1797556107220090880
author Yunqing Liu
Jiayi Zhao
Taylor S. Adams
Ningya Wang
Jonas C. Schupp
Weimiao Wu
John E. McDonough
Geoffrey L. Chupp
Naftali Kaminski
Zuoheng Wang
Xiting Yan
author_facet Yunqing Liu
Jiayi Zhao
Taylor S. Adams
Ningya Wang
Jonas C. Schupp
Weimiao Wu
John E. McDonough
Geoffrey L. Chupp
Naftali Kaminski
Zuoheng Wang
Xiting Yan
author_sort Yunqing Liu
collection DOAJ
description Abstract Background Single-cell RNA sequencing (scRNA-seq) technology has enabled assessment of transcriptome-wide changes at single-cell resolution. Due to the heterogeneity in environmental exposure and genetic background across subjects, subject effect contributes to the major source of variation in scRNA-seq data with multiple subjects, which severely confounds cell type specific differential expression (DE) analysis. Moreover, dropout events are prevalent in scRNA-seq data, leading to excessive number of zeroes in the data, which further aggravates the challenge in DE analysis. Results We developed iDESC to detect cell type specific DE genes between two groups of subjects in scRNA-seq data. iDESC uses a zero-inflated negative binomial mixed model to consider both subject effect and dropouts. The prevalence of dropout events (dropout rate) was demonstrated to be dependent on gene expression level, which is modeled by pooling information across genes. Subject effect is modeled as a random effect in the log-mean of the negative binomial component. We evaluated and compared the performance of iDESC with eleven existing DE analysis methods. Using simulated data, we demonstrated that iDESC had well-controlled type I error and higher power compared to the existing methods. Applications of those methods with well-controlled type I error to three real scRNA-seq datasets from the same tissue and disease showed that the results of iDESC achieved the best consistency between datasets and the best disease relevance. Conclusions iDESC was able to achieve more accurate and robust DE analysis results by separating subject effect from disease effect with consideration of dropouts to identify DE genes, suggesting the importance of considering subject effect and dropouts in the DE analysis of scRNA-seq data with multiple subjects.
first_indexed 2024-03-10T16:57:04Z
format Article
id doaj.art-98bb000233a34cca8225fba23b3cf131
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-03-10T16:57:04Z
publishDate 2023-08-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-98bb000233a34cca8225fba23b3cf1312023-11-20T11:06:39ZengBMCBMC Bioinformatics1471-21052023-08-0124112010.1186/s12859-023-05432-8iDESC: identifying differential expression in single-cell RNA sequencing data with multiple subjectsYunqing Liu0Jiayi Zhao1Taylor S. Adams2Ningya Wang3Jonas C. Schupp4Weimiao Wu5John E. McDonough6Geoffrey L. Chupp7Naftali Kaminski8Zuoheng Wang9Xiting Yan10Department of Biostatistics, Yale School of Public HealthDepartment of Biostatistics, Yale School of Public HealthSection of Pulmonary, Critical Care and Sleep Medicine, Yale School of MedicineDepartment of Biostatistics, Yale School of Public HealthSection of Pulmonary, Critical Care and Sleep Medicine, Yale School of MedicineDepartment of Biostatistics, Yale School of Public HealthSection of Pulmonary, Critical Care and Sleep Medicine, Yale School of MedicineSection of Pulmonary, Critical Care and Sleep Medicine, Yale School of MedicineSection of Pulmonary, Critical Care and Sleep Medicine, Yale School of MedicineDepartment of Biostatistics, Yale School of Public HealthDepartment of Biostatistics, Yale School of Public HealthAbstract Background Single-cell RNA sequencing (scRNA-seq) technology has enabled assessment of transcriptome-wide changes at single-cell resolution. Due to the heterogeneity in environmental exposure and genetic background across subjects, subject effect contributes to the major source of variation in scRNA-seq data with multiple subjects, which severely confounds cell type specific differential expression (DE) analysis. Moreover, dropout events are prevalent in scRNA-seq data, leading to excessive number of zeroes in the data, which further aggravates the challenge in DE analysis. Results We developed iDESC to detect cell type specific DE genes between two groups of subjects in scRNA-seq data. iDESC uses a zero-inflated negative binomial mixed model to consider both subject effect and dropouts. The prevalence of dropout events (dropout rate) was demonstrated to be dependent on gene expression level, which is modeled by pooling information across genes. Subject effect is modeled as a random effect in the log-mean of the negative binomial component. We evaluated and compared the performance of iDESC with eleven existing DE analysis methods. Using simulated data, we demonstrated that iDESC had well-controlled type I error and higher power compared to the existing methods. Applications of those methods with well-controlled type I error to three real scRNA-seq datasets from the same tissue and disease showed that the results of iDESC achieved the best consistency between datasets and the best disease relevance. Conclusions iDESC was able to achieve more accurate and robust DE analysis results by separating subject effect from disease effect with consideration of dropouts to identify DE genes, suggesting the importance of considering subject effect and dropouts in the DE analysis of scRNA-seq data with multiple subjects.https://doi.org/10.1186/s12859-023-05432-8Single-cell RNA sequencingDifferential expression analysisSubject effectZero-inflated negative binomial mixed model
spellingShingle Yunqing Liu
Jiayi Zhao
Taylor S. Adams
Ningya Wang
Jonas C. Schupp
Weimiao Wu
John E. McDonough
Geoffrey L. Chupp
Naftali Kaminski
Zuoheng Wang
Xiting Yan
iDESC: identifying differential expression in single-cell RNA sequencing data with multiple subjects
BMC Bioinformatics
Single-cell RNA sequencing
Differential expression analysis
Subject effect
Zero-inflated negative binomial mixed model
title iDESC: identifying differential expression in single-cell RNA sequencing data with multiple subjects
title_full iDESC: identifying differential expression in single-cell RNA sequencing data with multiple subjects
title_fullStr iDESC: identifying differential expression in single-cell RNA sequencing data with multiple subjects
title_full_unstemmed iDESC: identifying differential expression in single-cell RNA sequencing data with multiple subjects
title_short iDESC: identifying differential expression in single-cell RNA sequencing data with multiple subjects
title_sort idesc identifying differential expression in single cell rna sequencing data with multiple subjects
topic Single-cell RNA sequencing
Differential expression analysis
Subject effect
Zero-inflated negative binomial mixed model
url https://doi.org/10.1186/s12859-023-05432-8
work_keys_str_mv AT yunqingliu idescidentifyingdifferentialexpressioninsinglecellrnasequencingdatawithmultiplesubjects
AT jiayizhao idescidentifyingdifferentialexpressioninsinglecellrnasequencingdatawithmultiplesubjects
AT taylorsadams idescidentifyingdifferentialexpressioninsinglecellrnasequencingdatawithmultiplesubjects
AT ningyawang idescidentifyingdifferentialexpressioninsinglecellrnasequencingdatawithmultiplesubjects
AT jonascschupp idescidentifyingdifferentialexpressioninsinglecellrnasequencingdatawithmultiplesubjects
AT weimiaowu idescidentifyingdifferentialexpressioninsinglecellrnasequencingdatawithmultiplesubjects
AT johnemcdonough idescidentifyingdifferentialexpressioninsinglecellrnasequencingdatawithmultiplesubjects
AT geoffreylchupp idescidentifyingdifferentialexpressioninsinglecellrnasequencingdatawithmultiplesubjects
AT naftalikaminski idescidentifyingdifferentialexpressioninsinglecellrnasequencingdatawithmultiplesubjects
AT zuohengwang idescidentifyingdifferentialexpressioninsinglecellrnasequencingdatawithmultiplesubjects
AT xitingyan idescidentifyingdifferentialexpressioninsinglecellrnasequencingdatawithmultiplesubjects