Benchmark and Parameter Sensitivity Analysis of Single-Cell RNA Sequencing Clustering Methods

Single-cell RNA-seq (scRNAseq) is a powerful tool to study heterogeneity of cells. Recently, several clustering based methods have been proposed to identify distinct cell populations. These methods are based on different statistical models and usually require to perform several additional steps, suc...

Full description

Bibliographic Details
Main Authors: Monika Krzak, Yordan Raykov, Alexis Boukouvalas, Luisa Cutillo, Claudia Angelini
Format: Article
Language:English
Published: Frontiers Media S.A. 2019-12-01
Series:Frontiers in Genetics
Subjects:
Online Access:https://www.frontiersin.org/article/10.3389/fgene.2019.01253/full
_version_ 1818007811050700800
author Monika Krzak
Yordan Raykov
Alexis Boukouvalas
Luisa Cutillo
Claudia Angelini
author_facet Monika Krzak
Yordan Raykov
Alexis Boukouvalas
Luisa Cutillo
Claudia Angelini
author_sort Monika Krzak
collection DOAJ
description Single-cell RNA-seq (scRNAseq) is a powerful tool to study heterogeneity of cells. Recently, several clustering based methods have been proposed to identify distinct cell populations. These methods are based on different statistical models and usually require to perform several additional steps, such as preprocessing or dimension reduction, before applying the clustering algorithm. Individual steps are often controlled by method-specific parameters, permitting the method to be used in different modes on the same datasets, depending on the user choices. The large number of possibilities that these methods provide can intimidate non-expert users, since the available choices are not always clearly documented. In addition, to date, no large studies have invistigated the role and the impact that these choices can have in different experimental contexts. This work aims to provide new insights into the advantages and drawbacks of scRNAseq clustering methods and describe the ranges of possibilities that are offered to users. In particular, we provide an extensive evaluation of several methods with respect to different modes of usage and parameter settings by applying them to real and simulated datasets that vary in terms of dimensionality, number of cell populations or levels of noise. Remarkably, the results presented here show that great variability in the performance of the models is strongly attributed to the choice of the user-specific parameter settings. We describe several tendencies in the performance attributed to their modes of usage and different types of datasets, and identify which methods are strongly affected by data dimensionality in terms of computational time. Finally, we highlight some open challenges in scRNAseq data clustering, such as those related to the identification of the number of clusters.
first_indexed 2024-04-14T05:20:38Z
format Article
id doaj.art-9f33f15e483843319c6adfa621fcc443
institution Directory Open Access Journal
issn 1664-8021
language English
last_indexed 2024-04-14T05:20:38Z
publishDate 2019-12-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Genetics
spelling doaj.art-9f33f15e483843319c6adfa621fcc4432022-12-22T02:10:12ZengFrontiers Media S.A.Frontiers in Genetics1664-80212019-12-011010.3389/fgene.2019.01253486077Benchmark and Parameter Sensitivity Analysis of Single-Cell RNA Sequencing Clustering MethodsMonika Krzak0Yordan Raykov1Alexis Boukouvalas2Luisa Cutillo3Claudia Angelini4Institute for Applied Mathematics “Mauro Picone”, Naples, ItalyDepartment of Mathematics, Aston University, Birmingham, United KingdomMachine Learning Engineer Team, Prowler.io, Cambridge, United KingdomSchool of Mathematics, University of Leeds, Leeds, United KingdomInstitute for Applied Mathematics “Mauro Picone”, Naples, ItalySingle-cell RNA-seq (scRNAseq) is a powerful tool to study heterogeneity of cells. Recently, several clustering based methods have been proposed to identify distinct cell populations. These methods are based on different statistical models and usually require to perform several additional steps, such as preprocessing or dimension reduction, before applying the clustering algorithm. Individual steps are often controlled by method-specific parameters, permitting the method to be used in different modes on the same datasets, depending on the user choices. The large number of possibilities that these methods provide can intimidate non-expert users, since the available choices are not always clearly documented. In addition, to date, no large studies have invistigated the role and the impact that these choices can have in different experimental contexts. This work aims to provide new insights into the advantages and drawbacks of scRNAseq clustering methods and describe the ranges of possibilities that are offered to users. In particular, we provide an extensive evaluation of several methods with respect to different modes of usage and parameter settings by applying them to real and simulated datasets that vary in terms of dimensionality, number of cell populations or levels of noise. Remarkably, the results presented here show that great variability in the performance of the models is strongly attributed to the choice of the user-specific parameter settings. We describe several tendencies in the performance attributed to their modes of usage and different types of datasets, and identify which methods are strongly affected by data dimensionality in terms of computational time. Finally, we highlight some open challenges in scRNAseq data clustering, such as those related to the identification of the number of clusters.https://www.frontiersin.org/article/10.3389/fgene.2019.01253/fullsingle-cell RNA-seqclustering methodsbenchmarkparameter sensitivity analysishigh-dimensional data analysis
spellingShingle Monika Krzak
Yordan Raykov
Alexis Boukouvalas
Luisa Cutillo
Claudia Angelini
Benchmark and Parameter Sensitivity Analysis of Single-Cell RNA Sequencing Clustering Methods
Frontiers in Genetics
single-cell RNA-seq
clustering methods
benchmark
parameter sensitivity analysis
high-dimensional data analysis
title Benchmark and Parameter Sensitivity Analysis of Single-Cell RNA Sequencing Clustering Methods
title_full Benchmark and Parameter Sensitivity Analysis of Single-Cell RNA Sequencing Clustering Methods
title_fullStr Benchmark and Parameter Sensitivity Analysis of Single-Cell RNA Sequencing Clustering Methods
title_full_unstemmed Benchmark and Parameter Sensitivity Analysis of Single-Cell RNA Sequencing Clustering Methods
title_short Benchmark and Parameter Sensitivity Analysis of Single-Cell RNA Sequencing Clustering Methods
title_sort benchmark and parameter sensitivity analysis of single cell rna sequencing clustering methods
topic single-cell RNA-seq
clustering methods
benchmark
parameter sensitivity analysis
high-dimensional data analysis
url https://www.frontiersin.org/article/10.3389/fgene.2019.01253/full
work_keys_str_mv AT monikakrzak benchmarkandparametersensitivityanalysisofsinglecellrnasequencingclusteringmethods
AT yordanraykov benchmarkandparametersensitivityanalysisofsinglecellrnasequencingclusteringmethods
AT alexisboukouvalas benchmarkandparametersensitivityanalysisofsinglecellrnasequencingclusteringmethods
AT luisacutillo benchmarkandparametersensitivityanalysisofsinglecellrnasequencingclusteringmethods
AT claudiaangelini benchmarkandparametersensitivityanalysisofsinglecellrnasequencingclusteringmethods