Demystifying “drop-outs” in single-cell UMI data

Abstract Many existing pipelines for scRNA-seq data apply pre-processing steps such as normalization or imputation to account for excessive zeros or “drop-outs." Here, we extensively analyze diverse UMI data sets to show that clustering should be the foremost step of the workflow. We observe th...

Full description

Bibliographic Details
Main Authors: Tae Hyun Kim, Xiang Zhou, Mengjie Chen
Format: Article
Language:English
Published: BMC 2020-08-01
Series:Genome Biology
Online Access:http://link.springer.com/article/10.1186/s13059-020-02096-y
_version_ 1818942068440956928
author Tae Hyun Kim
Xiang Zhou
Mengjie Chen
author_facet Tae Hyun Kim
Xiang Zhou
Mengjie Chen
author_sort Tae Hyun Kim
collection DOAJ
description Abstract Many existing pipelines for scRNA-seq data apply pre-processing steps such as normalization or imputation to account for excessive zeros or “drop-outs." Here, we extensively analyze diverse UMI data sets to show that clustering should be the foremost step of the workflow. We observe that most drop-outs disappear once cell-type heterogeneity is resolved, while imputing or normalizing heterogeneous data can introduce unwanted noise. We propose a novel framework HIPPO (Heterogeneity-Inspired Pre-Processing tOol) that leverages zero proportions to explain cellular heterogeneity and integrates feature selection with iterative clustering. HIPPO leads to downstream analysis with greater flexibility and interpretability compared to alternatives.
first_indexed 2024-12-20T07:05:33Z
format Article
id doaj.art-36e9581b218c4dc9a279564f173b93c6
institution Directory Open Access Journal
issn 1474-760X
language English
last_indexed 2024-12-20T07:05:33Z
publishDate 2020-08-01
publisher BMC
record_format Article
series Genome Biology
spelling doaj.art-36e9581b218c4dc9a279564f173b93c62022-12-21T19:49:04ZengBMCGenome Biology1474-760X2020-08-0121111910.1186/s13059-020-02096-yDemystifying “drop-outs” in single-cell UMI dataTae Hyun Kim0Xiang Zhou1Mengjie Chen2Department of Statistics, University of ChicagoDepartment of Biostatistics, University of MichiganDepartment of Human Genetics and Department of Medicine, University of ChicagoAbstract Many existing pipelines for scRNA-seq data apply pre-processing steps such as normalization or imputation to account for excessive zeros or “drop-outs." Here, we extensively analyze diverse UMI data sets to show that clustering should be the foremost step of the workflow. We observe that most drop-outs disappear once cell-type heterogeneity is resolved, while imputing or normalizing heterogeneous data can introduce unwanted noise. We propose a novel framework HIPPO (Heterogeneity-Inspired Pre-Processing tOol) that leverages zero proportions to explain cellular heterogeneity and integrates feature selection with iterative clustering. HIPPO leads to downstream analysis with greater flexibility and interpretability compared to alternatives.http://link.springer.com/article/10.1186/s13059-020-02096-y
spellingShingle Tae Hyun Kim
Xiang Zhou
Mengjie Chen
Demystifying “drop-outs” in single-cell UMI data
Genome Biology
title Demystifying “drop-outs” in single-cell UMI data
title_full Demystifying “drop-outs” in single-cell UMI data
title_fullStr Demystifying “drop-outs” in single-cell UMI data
title_full_unstemmed Demystifying “drop-outs” in single-cell UMI data
title_short Demystifying “drop-outs” in single-cell UMI data
title_sort demystifying drop outs in single cell umi data
url http://link.springer.com/article/10.1186/s13059-020-02096-y
work_keys_str_mv AT taehyunkim demystifyingdropoutsinsinglecellumidata
AT xiangzhou demystifyingdropoutsinsinglecellumidata
AT mengjiechen demystifyingdropoutsinsinglecellumidata