scTransSort: Transformers for Intelligent Annotation of Cell Types by Gene Embeddings

Single-cell transcriptomics is rapidly advancing our understanding of the composition of complex tissues and biological cells, and single-cell RNA sequencing (scRNA-seq) holds great potential for identifying and characterizing the cell composition of complex tissues. Cell type identification by anal...

Full description

Bibliographic Details
Main Authors: Linfang Jiao, Gan Wang, Huanhuan Dai, Xue Li, Shuang Wang, Tao Song
Format: Article
Language:English
Published: MDPI AG 2023-03-01
Series:Biomolecules
Subjects:
Online Access:https://www.mdpi.com/2218-273X/13/4/611
_version_ 1797606250138042368
author Linfang Jiao
Gan Wang
Huanhuan Dai
Xue Li
Shuang Wang
Tao Song
author_facet Linfang Jiao
Gan Wang
Huanhuan Dai
Xue Li
Shuang Wang
Tao Song
author_sort Linfang Jiao
collection DOAJ
description Single-cell transcriptomics is rapidly advancing our understanding of the composition of complex tissues and biological cells, and single-cell RNA sequencing (scRNA-seq) holds great potential for identifying and characterizing the cell composition of complex tissues. Cell type identification by analyzing scRNA-seq data is mostly limited by time-consuming and irreproducible manual annotation. As scRNA-seq technology scales to thousands of cells per experiment, the exponential increase in the number of cell samples makes manual annotation more difficult. On the other hand, the sparsity of gene transcriptome data remains a major challenge. This paper applied the idea of the transformer to single-cell classification tasks based on scRNA-seq data. We propose scTransSort, a cell-type annotation method pretrained with single-cell transcriptomics data. The scTransSort incorporates a method of representing genes as gene expression embedding blocks to reduce the sparsity of data used for cell type identification and reduce the computational complexity. The feature of scTransSort is that its implementation of intelligent information extraction for unordered data, automatically extracting valid features of cell types without the need for manually labeled features and additional references. In experiments on cells from 35 human and 26 mouse tissues, scTransSort successfully elucidated its high accuracy and high performance for cell type identification, and demonstrated its own high robustness and generalization ability.
first_indexed 2024-03-11T05:12:32Z
format Article
id doaj.art-f0a276857b6042eeb3b10116e31fd4b9
institution Directory Open Access Journal
issn 2218-273X
language English
last_indexed 2024-03-11T05:12:32Z
publishDate 2023-03-01
publisher MDPI AG
record_format Article
series Biomolecules
spelling doaj.art-f0a276857b6042eeb3b10116e31fd4b92023-11-17T18:28:54ZengMDPI AGBiomolecules2218-273X2023-03-0113461110.3390/biom13040611scTransSort: Transformers for Intelligent Annotation of Cell Types by Gene EmbeddingsLinfang Jiao0Gan Wang1Huanhuan Dai2Xue Li3Shuang Wang4Tao Song5College of Computer Science and Technology, China University of Petroleum, Qingdao 266580, ChinaCollege of Computer Science and Technology, China University of Petroleum, Qingdao 266580, ChinaCollege of Computer Science and Technology, China University of Petroleum, Qingdao 266580, ChinaCollege of Computer Science and Technology, China University of Petroleum, Qingdao 266580, ChinaCollege of Computer Science and Technology, China University of Petroleum, Qingdao 266580, ChinaCollege of Computer Science and Technology, China University of Petroleum, Qingdao 266580, ChinaSingle-cell transcriptomics is rapidly advancing our understanding of the composition of complex tissues and biological cells, and single-cell RNA sequencing (scRNA-seq) holds great potential for identifying and characterizing the cell composition of complex tissues. Cell type identification by analyzing scRNA-seq data is mostly limited by time-consuming and irreproducible manual annotation. As scRNA-seq technology scales to thousands of cells per experiment, the exponential increase in the number of cell samples makes manual annotation more difficult. On the other hand, the sparsity of gene transcriptome data remains a major challenge. This paper applied the idea of the transformer to single-cell classification tasks based on scRNA-seq data. We propose scTransSort, a cell-type annotation method pretrained with single-cell transcriptomics data. The scTransSort incorporates a method of representing genes as gene expression embedding blocks to reduce the sparsity of data used for cell type identification and reduce the computational complexity. The feature of scTransSort is that its implementation of intelligent information extraction for unordered data, automatically extracting valid features of cell types without the need for manually labeled features and additional references. In experiments on cells from 35 human and 26 mouse tissues, scTransSort successfully elucidated its high accuracy and high performance for cell type identification, and demonstrated its own high robustness and generalization ability.https://www.mdpi.com/2218-273X/13/4/611scRNA-seqcell typeclassificationannotationidentitytransformer
spellingShingle Linfang Jiao
Gan Wang
Huanhuan Dai
Xue Li
Shuang Wang
Tao Song
scTransSort: Transformers for Intelligent Annotation of Cell Types by Gene Embeddings
Biomolecules
scRNA-seq
cell type
classification
annotation
identity
transformer
title scTransSort: Transformers for Intelligent Annotation of Cell Types by Gene Embeddings
title_full scTransSort: Transformers for Intelligent Annotation of Cell Types by Gene Embeddings
title_fullStr scTransSort: Transformers for Intelligent Annotation of Cell Types by Gene Embeddings
title_full_unstemmed scTransSort: Transformers for Intelligent Annotation of Cell Types by Gene Embeddings
title_short scTransSort: Transformers for Intelligent Annotation of Cell Types by Gene Embeddings
title_sort sctranssort transformers for intelligent annotation of cell types by gene embeddings
topic scRNA-seq
cell type
classification
annotation
identity
transformer
url https://www.mdpi.com/2218-273X/13/4/611
work_keys_str_mv AT linfangjiao sctranssorttransformersforintelligentannotationofcelltypesbygeneembeddings
AT ganwang sctranssorttransformersforintelligentannotationofcelltypesbygeneembeddings
AT huanhuandai sctranssorttransformersforintelligentannotationofcelltypesbygeneembeddings
AT xueli sctranssorttransformersforintelligentannotationofcelltypesbygeneembeddings
AT shuangwang sctranssorttransformersforintelligentannotationofcelltypesbygeneembeddings
AT taosong sctranssorttransformersforintelligentannotationofcelltypesbygeneembeddings