Stability of scRNA-Seq Analysis Workflows is Susceptible to Preprocessing and is Mitigated by Regularized or Supervised Approaches

Background: Statistical methods developed to address various questions in single-cell datasets show increased variability to different parameter regimes. In order to delineate further the robustness of commonly utilized methods for single-cell RNA-Seq, we aimed to comprehensively review scRNA-Seq an...

Full description

Bibliographic Details
Main Authors: Arda Durmaz, Jacob G Scott
Format: Article
Language:English
Published: SAGE Publishing 2022-09-01
Series:Evolutionary Bioinformatics
Online Access:https://doi.org/10.1177/11769343221123050
_version_ 1797997867526258688
author Arda Durmaz
Jacob G Scott
author_facet Arda Durmaz
Jacob G Scott
author_sort Arda Durmaz
collection DOAJ
description Background: Statistical methods developed to address various questions in single-cell datasets show increased variability to different parameter regimes. In order to delineate further the robustness of commonly utilized methods for single-cell RNA-Seq, we aimed to comprehensively review scRNA-Seq analysis workflows in the setting of dimension reduction, clustering, and trajectory inference. Methods: We utilized datasets with temporal single-cell transcriptomics profiles from public repositories. Combining multiple methods at each level of the workflow, we have performed over 6 k analysis and evaluated the results of clustering and pseudotime estimation using adjusted rand index and rank correlation metrics. We have further integrated neural network methods to assess whether models with increased complexity can show increased bias/variance trade-off. Results: Combinatorial workflows showed that utilizing non-linear dimension reduction techniques such as t-SNE and UMAP are sensitive to initial preprocessing steps hence clustering results on dimension reduced space of single-cell datasets should be utilized carefully. Similarly, pseudotime estimation methods that depend on previous non-linear dimension reduction steps can result in highly variable trajectories. In contrast, methods that avoid non-linearity such as WOT can result in repeatable inferences of temporal gene expression dynamics. Furthermore, imputation methods do not improve clustering or trajectory inference results substantially in terms of repeatability. In contrast, the selection of the normalization method shows an increased effect on downstream analysis where ScTransform reduces variability overall.
first_indexed 2024-04-11T10:39:31Z
format Article
id doaj.art-e9e80e7c3f3c4db4803e4f21ce9acec8
institution Directory Open Access Journal
issn 1176-9343
language English
last_indexed 2024-04-11T10:39:31Z
publishDate 2022-09-01
publisher SAGE Publishing
record_format Article
series Evolutionary Bioinformatics
spelling doaj.art-e9e80e7c3f3c4db4803e4f21ce9acec82022-12-22T04:29:13ZengSAGE PublishingEvolutionary Bioinformatics1176-93432022-09-011810.1177/11769343221123050Stability of scRNA-Seq Analysis Workflows is Susceptible to Preprocessing and is Mitigated by Regularized or Supervised ApproachesArda Durmaz0Jacob G Scott1Systems Biology and Bioinformatics Graduate Program, Case Western Reserve University, Cleveland, OH, USADepartment of Translational Hematology and Oncology Research, Cleveland Clinic, Cleveland, OH, USABackground: Statistical methods developed to address various questions in single-cell datasets show increased variability to different parameter regimes. In order to delineate further the robustness of commonly utilized methods for single-cell RNA-Seq, we aimed to comprehensively review scRNA-Seq analysis workflows in the setting of dimension reduction, clustering, and trajectory inference. Methods: We utilized datasets with temporal single-cell transcriptomics profiles from public repositories. Combining multiple methods at each level of the workflow, we have performed over 6 k analysis and evaluated the results of clustering and pseudotime estimation using adjusted rand index and rank correlation metrics. We have further integrated neural network methods to assess whether models with increased complexity can show increased bias/variance trade-off. Results: Combinatorial workflows showed that utilizing non-linear dimension reduction techniques such as t-SNE and UMAP are sensitive to initial preprocessing steps hence clustering results on dimension reduced space of single-cell datasets should be utilized carefully. Similarly, pseudotime estimation methods that depend on previous non-linear dimension reduction steps can result in highly variable trajectories. In contrast, methods that avoid non-linearity such as WOT can result in repeatable inferences of temporal gene expression dynamics. Furthermore, imputation methods do not improve clustering or trajectory inference results substantially in terms of repeatability. In contrast, the selection of the normalization method shows an increased effect on downstream analysis where ScTransform reduces variability overall.https://doi.org/10.1177/11769343221123050
spellingShingle Arda Durmaz
Jacob G Scott
Stability of scRNA-Seq Analysis Workflows is Susceptible to Preprocessing and is Mitigated by Regularized or Supervised Approaches
Evolutionary Bioinformatics
title Stability of scRNA-Seq Analysis Workflows is Susceptible to Preprocessing and is Mitigated by Regularized or Supervised Approaches
title_full Stability of scRNA-Seq Analysis Workflows is Susceptible to Preprocessing and is Mitigated by Regularized or Supervised Approaches
title_fullStr Stability of scRNA-Seq Analysis Workflows is Susceptible to Preprocessing and is Mitigated by Regularized or Supervised Approaches
title_full_unstemmed Stability of scRNA-Seq Analysis Workflows is Susceptible to Preprocessing and is Mitigated by Regularized or Supervised Approaches
title_short Stability of scRNA-Seq Analysis Workflows is Susceptible to Preprocessing and is Mitigated by Regularized or Supervised Approaches
title_sort stability of scrna seq analysis workflows is susceptible to preprocessing and is mitigated by regularized or supervised approaches
url https://doi.org/10.1177/11769343221123050
work_keys_str_mv AT ardadurmaz stabilityofscrnaseqanalysisworkflowsissusceptibletopreprocessingandismitigatedbyregularizedorsupervisedapproaches
AT jacobgscott stabilityofscrnaseqanalysisworkflowsissusceptibletopreprocessingandismitigatedbyregularizedorsupervisedapproaches