Single Cell Self-Paced Clustering with Transcriptome Sequencing Data

Single cell RNA sequencing (scRNA-seq) allows researchers to explore tissue heterogeneity, distinguish unusual cell identities, and find novel cellular subtypes by providing transcriptome profiling for individual cells. Clustering analysis is usually used to predict cell class assignments and infer...

Full description

Bibliographic Details
Main Authors: Peng Zhao, Zenglin Xu, Junjie Chen, Yazhou Ren, Irwin King
Format: Article
Language:English
Published: MDPI AG 2022-03-01
Series:International Journal of Molecular Sciences
Subjects:
Online Access:https://www.mdpi.com/1422-0067/23/7/3900
Description
Summary:Single cell RNA sequencing (scRNA-seq) allows researchers to explore tissue heterogeneity, distinguish unusual cell identities, and find novel cellular subtypes by providing transcriptome profiling for individual cells. Clustering analysis is usually used to predict cell class assignments and infer cell identities. However, the performance of existing single-cell clustering methods is extremely sensitive to the presence of noise data and outliers. Existing clustering algorithms can easily fall into local optimal solutions. There is still no consensus on the best performing method. To address this issue, we introduce a single cell self-paced clustering (scSPaC) method with F-norm based nonnegative matrix factorization (NMF) for scRNA-seq data and a sparse single cell self-paced clustering (sscSPaC) method with <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>l</mi><mn>21</mn></msub></semantics></math></inline-formula>-norm based nonnegative matrix factorization for scRNA-seq data. We gradually add single cells from simple to complex to our model until all cells are selected. In this way, the influences of noisy data and outliers can be significantly reduced. The proposed method achieved the best performance on both simulation data and real scRNA-seq data. A case study about human clara cells and ependymal cells scRNA-seq data clustering shows that scSPaC is more advantageous near the clustering dividing line.
ISSN:1661-6596
1422-0067