Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression

Abstract Single-cell RNA-seq (scRNA-seq) data exhibits significant cell-to-cell variation due to technical factors, including the number of molecules detected in each cell, which can confound biological heterogeneity with technical effects. To address this, we present a modeling framework for the no...

Full description

Bibliographic Details
Main Authors: Christoph Hafemeister, Rahul Satija
Format: Article
Language:English
Published: BMC 2019-12-01
Series:Genome Biology
Subjects:
Online Access:https://doi.org/10.1186/s13059-019-1874-1
_version_ 1818386985257009152
author Christoph Hafemeister
Rahul Satija
author_facet Christoph Hafemeister
Rahul Satija
author_sort Christoph Hafemeister
collection DOAJ
description Abstract Single-cell RNA-seq (scRNA-seq) data exhibits significant cell-to-cell variation due to technical factors, including the number of molecules detected in each cell, which can confound biological heterogeneity with technical effects. To address this, we present a modeling framework for the normalization and variance stabilization of molecular count data from scRNA-seq experiments. We propose that the Pearson residuals from “regularized negative binomial regression,” where cellular sequencing depth is utilized as a covariate in a generalized linear model, successfully remove the influence of technical characteristics from downstream analyses while preserving biological heterogeneity. Importantly, we show that an unconstrained negative binomial model may overfit scRNA-seq data, and overcome this by pooling information across genes with similar abundances to obtain stable parameter estimates. Our procedure omits the need for heuristic steps including pseudocount addition or log-transformation and improves common downstream analytical tasks such as variable gene selection, dimensional reduction, and differential expression. Our approach can be applied to any UMI-based scRNA-seq dataset and is freely available as part of the R package sctransform, with a direct interface to our single-cell toolkit Seurat.
first_indexed 2024-12-14T04:02:45Z
format Article
id doaj.art-6bb8ef5e3bc44c6e84eabcff715b74c3
institution Directory Open Access Journal
issn 1474-760X
language English
last_indexed 2024-12-14T04:02:45Z
publishDate 2019-12-01
publisher BMC
record_format Article
series Genome Biology
spelling doaj.art-6bb8ef5e3bc44c6e84eabcff715b74c32022-12-21T23:17:54ZengBMCGenome Biology1474-760X2019-12-0120111510.1186/s13059-019-1874-1Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regressionChristoph Hafemeister0Rahul Satija1New York Genome CenterNew York Genome CenterAbstract Single-cell RNA-seq (scRNA-seq) data exhibits significant cell-to-cell variation due to technical factors, including the number of molecules detected in each cell, which can confound biological heterogeneity with technical effects. To address this, we present a modeling framework for the normalization and variance stabilization of molecular count data from scRNA-seq experiments. We propose that the Pearson residuals from “regularized negative binomial regression,” where cellular sequencing depth is utilized as a covariate in a generalized linear model, successfully remove the influence of technical characteristics from downstream analyses while preserving biological heterogeneity. Importantly, we show that an unconstrained negative binomial model may overfit scRNA-seq data, and overcome this by pooling information across genes with similar abundances to obtain stable parameter estimates. Our procedure omits the need for heuristic steps including pseudocount addition or log-transformation and improves common downstream analytical tasks such as variable gene selection, dimensional reduction, and differential expression. Our approach can be applied to any UMI-based scRNA-seq dataset and is freely available as part of the R package sctransform, with a direct interface to our single-cell toolkit Seurat.https://doi.org/10.1186/s13059-019-1874-1Single-cell RNA-seqNormalization
spellingShingle Christoph Hafemeister
Rahul Satija
Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression
Genome Biology
Single-cell RNA-seq
Normalization
title Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression
title_full Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression
title_fullStr Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression
title_full_unstemmed Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression
title_short Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression
title_sort normalization and variance stabilization of single cell rna seq data using regularized negative binomial regression
topic Single-cell RNA-seq
Normalization
url https://doi.org/10.1186/s13059-019-1874-1
work_keys_str_mv AT christophhafemeister normalizationandvariancestabilizationofsinglecellrnaseqdatausingregularizednegativebinomialregression
AT rahulsatija normalizationandvariancestabilizationofsinglecellrnaseqdatausingregularizednegativebinomialregression