Selecting single cell clustering parameter values using subsampling-based robustness metrics

Abstract Background Generating and analysing single-cell data has become a widespread approach to examine tissue heterogeneity, and numerous algorithms exist for clustering these datasets to identify putative cell types with shared transcriptomic signatures. However, many of these clustering workflo...

Full description

Bibliographic Details
Main Authors: Ryan B. Patterson-Cross, Ariel J. Levine, Vilas Menon
Format: Article
Language:English
Published: BMC 2021-02-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-021-03957-4
_version_ 1818611558369984512
author Ryan B. Patterson-Cross
Ariel J. Levine
Vilas Menon
author_facet Ryan B. Patterson-Cross
Ariel J. Levine
Vilas Menon
author_sort Ryan B. Patterson-Cross
collection DOAJ
description Abstract Background Generating and analysing single-cell data has become a widespread approach to examine tissue heterogeneity, and numerous algorithms exist for clustering these datasets to identify putative cell types with shared transcriptomic signatures. However, many of these clustering workflows rely on user-tuned parameter values, tailored to each dataset, to identify a set of biologically relevant clusters. Whereas users often develop their own intuition as to the optimal range of parameters for clustering on each data set, the lack of systematic approaches to identify this range can be daunting to new users of any given workflow. In addition, an optimal parameter set does not guarantee that all clusters are equally well-resolved, given the heterogeneity in transcriptomic signatures in most biological systems. Results Here, we illustrate a subsampling-based approach (chooseR) that simultaneously guides parameter selection and characterizes cluster robustness. Through bootstrapped iterative clustering across a range of parameters, chooseR was used to select parameter values for two distinct clustering workflows (Seurat and scVI). In each case, chooseR identified parameters that produced biologically relevant clusters from both well-characterized (human PBMC) and complex (mouse spinal cord) datasets. Moreover, it provided a simple “robustness score” for each of these clusters, facilitating the assessment of cluster quality. Conclusion chooseR is a simple, conceptually understandable tool that can be used flexibly across clustering algorithms, workflows, and datasets to guide clustering parameter selection and characterize cluster robustness.
first_indexed 2024-12-16T15:32:14Z
format Article
id doaj.art-b84d380938f047e4b749af13974c5eae
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-16T15:32:14Z
publishDate 2021-02-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-b84d380938f047e4b749af13974c5eae2022-12-21T22:26:18ZengBMCBMC Bioinformatics1471-21052021-02-0122111310.1186/s12859-021-03957-4Selecting single cell clustering parameter values using subsampling-based robustness metricsRyan B. Patterson-Cross0Ariel J. Levine1Vilas Menon2Spinal Circuits and Plasticity Unit, National Institute of Neurological Disorders and Stroke, National Institutes of HealthSpinal Circuits and Plasticity Unit, National Institute of Neurological Disorders and Stroke, National Institutes of HealthDepartment of Neurology, Center for Translational and Computational Neuroimmunology, Columbia UniversityAbstract Background Generating and analysing single-cell data has become a widespread approach to examine tissue heterogeneity, and numerous algorithms exist for clustering these datasets to identify putative cell types with shared transcriptomic signatures. However, many of these clustering workflows rely on user-tuned parameter values, tailored to each dataset, to identify a set of biologically relevant clusters. Whereas users often develop their own intuition as to the optimal range of parameters for clustering on each data set, the lack of systematic approaches to identify this range can be daunting to new users of any given workflow. In addition, an optimal parameter set does not guarantee that all clusters are equally well-resolved, given the heterogeneity in transcriptomic signatures in most biological systems. Results Here, we illustrate a subsampling-based approach (chooseR) that simultaneously guides parameter selection and characterizes cluster robustness. Through bootstrapped iterative clustering across a range of parameters, chooseR was used to select parameter values for two distinct clustering workflows (Seurat and scVI). In each case, chooseR identified parameters that produced biologically relevant clusters from both well-characterized (human PBMC) and complex (mouse spinal cord) datasets. Moreover, it provided a simple “robustness score” for each of these clusters, facilitating the assessment of cluster quality. Conclusion chooseR is a simple, conceptually understandable tool that can be used flexibly across clustering algorithms, workflows, and datasets to guide clustering parameter selection and characterize cluster robustness.https://doi.org/10.1186/s12859-021-03957-4Single cell RNAseqParameter selectionClusteringResolution
spellingShingle Ryan B. Patterson-Cross
Ariel J. Levine
Vilas Menon
Selecting single cell clustering parameter values using subsampling-based robustness metrics
BMC Bioinformatics
Single cell RNAseq
Parameter selection
Clustering
Resolution
title Selecting single cell clustering parameter values using subsampling-based robustness metrics
title_full Selecting single cell clustering parameter values using subsampling-based robustness metrics
title_fullStr Selecting single cell clustering parameter values using subsampling-based robustness metrics
title_full_unstemmed Selecting single cell clustering parameter values using subsampling-based robustness metrics
title_short Selecting single cell clustering parameter values using subsampling-based robustness metrics
title_sort selecting single cell clustering parameter values using subsampling based robustness metrics
topic Single cell RNAseq
Parameter selection
Clustering
Resolution
url https://doi.org/10.1186/s12859-021-03957-4
work_keys_str_mv AT ryanbpattersoncross selectingsinglecellclusteringparametervaluesusingsubsamplingbasedrobustnessmetrics
AT arieljlevine selectingsinglecellclusteringparametervaluesusingsubsamplingbasedrobustnessmetrics
AT vilasmenon selectingsinglecellclusteringparametervaluesusingsubsamplingbasedrobustnessmetrics