Demystifying “drop-outs” in single-cell UMI data
Abstract Many existing pipelines for scRNA-seq data apply pre-processing steps such as normalization or imputation to account for excessive zeros or “drop-outs." Here, we extensively analyze diverse UMI data sets to show that clustering should be the foremost step of the workflow. We observe th...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2020-08-01
|
Series: | Genome Biology |
Online Access: | http://link.springer.com/article/10.1186/s13059-020-02096-y |
_version_ | 1818942068440956928 |
---|---|
author | Tae Hyun Kim Xiang Zhou Mengjie Chen |
author_facet | Tae Hyun Kim Xiang Zhou Mengjie Chen |
author_sort | Tae Hyun Kim |
collection | DOAJ |
description | Abstract Many existing pipelines for scRNA-seq data apply pre-processing steps such as normalization or imputation to account for excessive zeros or “drop-outs." Here, we extensively analyze diverse UMI data sets to show that clustering should be the foremost step of the workflow. We observe that most drop-outs disappear once cell-type heterogeneity is resolved, while imputing or normalizing heterogeneous data can introduce unwanted noise. We propose a novel framework HIPPO (Heterogeneity-Inspired Pre-Processing tOol) that leverages zero proportions to explain cellular heterogeneity and integrates feature selection with iterative clustering. HIPPO leads to downstream analysis with greater flexibility and interpretability compared to alternatives. |
first_indexed | 2024-12-20T07:05:33Z |
format | Article |
id | doaj.art-36e9581b218c4dc9a279564f173b93c6 |
institution | Directory Open Access Journal |
issn | 1474-760X |
language | English |
last_indexed | 2024-12-20T07:05:33Z |
publishDate | 2020-08-01 |
publisher | BMC |
record_format | Article |
series | Genome Biology |
spelling | doaj.art-36e9581b218c4dc9a279564f173b93c62022-12-21T19:49:04ZengBMCGenome Biology1474-760X2020-08-0121111910.1186/s13059-020-02096-yDemystifying “drop-outs” in single-cell UMI dataTae Hyun Kim0Xiang Zhou1Mengjie Chen2Department of Statistics, University of ChicagoDepartment of Biostatistics, University of MichiganDepartment of Human Genetics and Department of Medicine, University of ChicagoAbstract Many existing pipelines for scRNA-seq data apply pre-processing steps such as normalization or imputation to account for excessive zeros or “drop-outs." Here, we extensively analyze diverse UMI data sets to show that clustering should be the foremost step of the workflow. We observe that most drop-outs disappear once cell-type heterogeneity is resolved, while imputing or normalizing heterogeneous data can introduce unwanted noise. We propose a novel framework HIPPO (Heterogeneity-Inspired Pre-Processing tOol) that leverages zero proportions to explain cellular heterogeneity and integrates feature selection with iterative clustering. HIPPO leads to downstream analysis with greater flexibility and interpretability compared to alternatives.http://link.springer.com/article/10.1186/s13059-020-02096-y |
spellingShingle | Tae Hyun Kim Xiang Zhou Mengjie Chen Demystifying “drop-outs” in single-cell UMI data Genome Biology |
title | Demystifying “drop-outs” in single-cell UMI data |
title_full | Demystifying “drop-outs” in single-cell UMI data |
title_fullStr | Demystifying “drop-outs” in single-cell UMI data |
title_full_unstemmed | Demystifying “drop-outs” in single-cell UMI data |
title_short | Demystifying “drop-outs” in single-cell UMI data |
title_sort | demystifying drop outs in single cell umi data |
url | http://link.springer.com/article/10.1186/s13059-020-02096-y |
work_keys_str_mv | AT taehyunkim demystifyingdropoutsinsinglecellumidata AT xiangzhou demystifyingdropoutsinsinglecellumidata AT mengjiechen demystifyingdropoutsinsinglecellumidata |