A Markov random field model for network-based differential expression analysis of single-cell RNA-seq data

Abstract Background Recent development of single cell sequencing technologies has made it possible to identify genes with different expression (DE) levels at the cell type level between different groups of samples. In this article, we propose to borrow information through known biological networks t...

Full description

Bibliographic Details
Main Authors: Hongyu Li, Biqing Zhu, Zhichao Xu, Taylor Adams, Naftali Kaminski, Hongyu Zhao
Format: Article
Language:English
Published: BMC 2021-10-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-021-04412-0
_version_ 1818693647151923200
author Hongyu Li
Biqing Zhu
Zhichao Xu
Taylor Adams
Naftali Kaminski
Hongyu Zhao
author_facet Hongyu Li
Biqing Zhu
Zhichao Xu
Taylor Adams
Naftali Kaminski
Hongyu Zhao
author_sort Hongyu Li
collection DOAJ
description Abstract Background Recent development of single cell sequencing technologies has made it possible to identify genes with different expression (DE) levels at the cell type level between different groups of samples. In this article, we propose to borrow information through known biological networks to increase statistical power to identify differentially expressed genes (DEGs). Results We develop MRFscRNAseq, which is based on a Markov random field (MRF) model to appropriately accommodate gene network information as well as dependencies among cell types to identify cell-type specific DEGs. We implement an Expectation-Maximization (EM) algorithm with mean field-like approximation to estimate model parameters and a Gibbs sampler to infer DE status. Simulation study shows that our method has better power to detect cell-type specific DEGs than conventional methods while appropriately controlling type I error rate. The usefulness of our method is demonstrated through its application to study the pathogenesis and biological processes of idiopathic pulmonary fibrosis (IPF) using a single-cell RNA-sequencing (scRNA-seq) data set, which contains 18,150 protein-coding genes across 38 cell types on lung tissues from 32 IPF patients and 28 normal controls. Conclusions The proposed MRF model is implemented in the R package MRFscRNAseq available on GitHub. By utilizing gene-gene and cell-cell networks, our method increases statistical power to detect differentially expressed genes from scRNA-seq data.
first_indexed 2024-12-17T13:17:00Z
format Article
id doaj.art-73f8d1f15edc4208ad885fb4bf2897d7
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-17T13:17:00Z
publishDate 2021-10-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-73f8d1f15edc4208ad885fb4bf2897d72022-12-21T21:46:59ZengBMCBMC Bioinformatics1471-21052021-10-0122111610.1186/s12859-021-04412-0A Markov random field model for network-based differential expression analysis of single-cell RNA-seq dataHongyu Li0Biqing Zhu1Zhichao Xu2Taylor Adams3Naftali Kaminski4Hongyu Zhao5Department of Biostatistics, School of Public Health, Yale UniversityProgram of Computational Biology and Bioinformatics, Yale UniversityDepartment of Biostatistics, School of Public Health, Yale UniversitySection of Pulmonary, Critical Care and Sleep Medicine, Department of Internal Medicine, Yale School of MedicineSection of Pulmonary, Critical Care and Sleep Medicine, Department of Internal Medicine, Yale School of MedicineDepartment of Biostatistics, School of Public Health, Yale UniversityAbstract Background Recent development of single cell sequencing technologies has made it possible to identify genes with different expression (DE) levels at the cell type level between different groups of samples. In this article, we propose to borrow information through known biological networks to increase statistical power to identify differentially expressed genes (DEGs). Results We develop MRFscRNAseq, which is based on a Markov random field (MRF) model to appropriately accommodate gene network information as well as dependencies among cell types to identify cell-type specific DEGs. We implement an Expectation-Maximization (EM) algorithm with mean field-like approximation to estimate model parameters and a Gibbs sampler to infer DE status. Simulation study shows that our method has better power to detect cell-type specific DEGs than conventional methods while appropriately controlling type I error rate. The usefulness of our method is demonstrated through its application to study the pathogenesis and biological processes of idiopathic pulmonary fibrosis (IPF) using a single-cell RNA-sequencing (scRNA-seq) data set, which contains 18,150 protein-coding genes across 38 cell types on lung tissues from 32 IPF patients and 28 normal controls. Conclusions The proposed MRF model is implemented in the R package MRFscRNAseq available on GitHub. By utilizing gene-gene and cell-cell networks, our method increases statistical power to detect differentially expressed genes from scRNA-seq data.https://doi.org/10.1186/s12859-021-04412-0Markov random fieldDifferential expressionscRNA-seq
spellingShingle Hongyu Li
Biqing Zhu
Zhichao Xu
Taylor Adams
Naftali Kaminski
Hongyu Zhao
A Markov random field model for network-based differential expression analysis of single-cell RNA-seq data
BMC Bioinformatics
Markov random field
Differential expression
scRNA-seq
title A Markov random field model for network-based differential expression analysis of single-cell RNA-seq data
title_full A Markov random field model for network-based differential expression analysis of single-cell RNA-seq data
title_fullStr A Markov random field model for network-based differential expression analysis of single-cell RNA-seq data
title_full_unstemmed A Markov random field model for network-based differential expression analysis of single-cell RNA-seq data
title_short A Markov random field model for network-based differential expression analysis of single-cell RNA-seq data
title_sort markov random field model for network based differential expression analysis of single cell rna seq data
topic Markov random field
Differential expression
scRNA-seq
url https://doi.org/10.1186/s12859-021-04412-0
work_keys_str_mv AT hongyuli amarkovrandomfieldmodelfornetworkbaseddifferentialexpressionanalysisofsinglecellrnaseqdata
AT biqingzhu amarkovrandomfieldmodelfornetworkbaseddifferentialexpressionanalysisofsinglecellrnaseqdata
AT zhichaoxu amarkovrandomfieldmodelfornetworkbaseddifferentialexpressionanalysisofsinglecellrnaseqdata
AT tayloradams amarkovrandomfieldmodelfornetworkbaseddifferentialexpressionanalysisofsinglecellrnaseqdata
AT naftalikaminski amarkovrandomfieldmodelfornetworkbaseddifferentialexpressionanalysisofsinglecellrnaseqdata
AT hongyuzhao amarkovrandomfieldmodelfornetworkbaseddifferentialexpressionanalysisofsinglecellrnaseqdata
AT hongyuli markovrandomfieldmodelfornetworkbaseddifferentialexpressionanalysisofsinglecellrnaseqdata
AT biqingzhu markovrandomfieldmodelfornetworkbaseddifferentialexpressionanalysisofsinglecellrnaseqdata
AT zhichaoxu markovrandomfieldmodelfornetworkbaseddifferentialexpressionanalysisofsinglecellrnaseqdata
AT tayloradams markovrandomfieldmodelfornetworkbaseddifferentialexpressionanalysisofsinglecellrnaseqdata
AT naftalikaminski markovrandomfieldmodelfornetworkbaseddifferentialexpressionanalysisofsinglecellrnaseqdata
AT hongyuzhao markovrandomfieldmodelfornetworkbaseddifferentialexpressionanalysisofsinglecellrnaseqdata