Summary: | Single-cell clustering facilitates the identification of different cell types, especially the identification of rare cells. Preprocessing and dimensionality reduction are the two most commonly used data-processing methods and are very important for single-cell clustering. However, we found that different preprocessing and dimensionality reduction methods have very different effects on single-cell clustering. In addition, there seems to be no specific combination of preprocessing and dimensionality reduction methods that is applicable to all datasets. In this study, we developed a new algorithm for improving single-cell clustering results, called SCM. It first automatically searched for an optimal combination that corresponds to the best cell type clustering of a given dataset. It then defined a flexible cell-to-cell distance measure with data specificity for cell-type clustering. Experiments on ten benchmark datasets showed that SCM performed better than almost all the other seven popular clustering algorithms. For example, the average ARI improvement of SCM over the second best method SC3 even reached 29.31% on the ten datasets, which demonstrated its great potential in revealing cellular heterogeneity, identifying cell types, depicting cell functional states, inferring cellular dynamics, and other related research areas.
|