SAIC: an iterative clustering approach for analysis of single cell RNA-seq data

Abstract Background Research interests toward single cell analysis have greatly increased in basic, translational and clinical research areas recently, as advances in whole-transcriptome amplification technique allow scientists to get accurate sequencing result at single cell level. An important ste...

Full description

Bibliographic Details
Main Authors: Lu Yang, Jiancheng Liu, Qiang Lu, Arthur D. Riggs, Xiwei Wu
Format: Article
Language:English
Published: BMC 2017-10-01
Series:BMC Genomics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12864-017-4019-5
_version_ 1811195913784262656
author Lu Yang
Jiancheng Liu
Qiang Lu
Arthur D. Riggs
Xiwei Wu
author_facet Lu Yang
Jiancheng Liu
Qiang Lu
Arthur D. Riggs
Xiwei Wu
author_sort Lu Yang
collection DOAJ
description Abstract Background Research interests toward single cell analysis have greatly increased in basic, translational and clinical research areas recently, as advances in whole-transcriptome amplification technique allow scientists to get accurate sequencing result at single cell level. An important step in the single-cell transcriptome analysis is to identify distinct cell groups that have different gene expression patterns. Currently there are limited bioinformatics approaches available for single-cell RNA-seq analysis. Many studies rely on principal component analysis (PCA) with arbitrary parameters to identify the genes that will be used to cluster the single cells. Results We have developed a novel algorithm, called SAIC (Single cell Analysis via Iterative Clustering), that identifies the optimal set of signature genes to separate single cells into distinct groups. Our method utilizes an iterative clustering approach to perform an exhaustive search for the best parameters within the search space, which is defined by a number of initial centers and P values. The end point is identification of a signature gene set that gives the best separation of the cell clusters. Using a simulated data set, we showed that SAIC can successfully identify the pre-defined signature gene sets that can correctly separated the cells into predefined clusters. We applied SAIC to two published single cell RNA-seq datasets. For both datasets, SAIC was able to identify a subset of signature genes that can cluster the single cells into groups that are consistent with the published results. The signature genes identified by SAIC resulted in better clusters of cells based on DB index score, and many genes also showed tissue specific expression. Conclusions In summary, we have developed an efficient algorithm to identify the optimal subset of genes that separate single cells into distinct clusters based on their expression patterns. We have shown that it performs better than PCA method using published single cell RNA-seq datasets.
first_indexed 2024-04-12T00:50:16Z
format Article
id doaj.art-5974dd765a4844d98ae7a8a3465c24db
institution Directory Open Access Journal
issn 1471-2164
language English
last_indexed 2024-04-12T00:50:16Z
publishDate 2017-10-01
publisher BMC
record_format Article
series BMC Genomics
spelling doaj.art-5974dd765a4844d98ae7a8a3465c24db2022-12-22T03:54:45ZengBMCBMC Genomics1471-21642017-10-0118S691710.1186/s12864-017-4019-5SAIC: an iterative clustering approach for analysis of single cell RNA-seq dataLu Yang0Jiancheng Liu1Qiang Lu2Arthur D. Riggs3Xiwei Wu4Integrative Genomics Core, Beckman Research Institute, City of HopeDepartment of Developmental and Stem Cell Biology, Beckman Research Institute, City of HopeDepartment of Developmental and Stem Cell Biology, Beckman Research Institute, City of HopeDiabetes and Metabolism Research Institute, City of HopeIntegrative Genomics Core, Beckman Research Institute, City of HopeAbstract Background Research interests toward single cell analysis have greatly increased in basic, translational and clinical research areas recently, as advances in whole-transcriptome amplification technique allow scientists to get accurate sequencing result at single cell level. An important step in the single-cell transcriptome analysis is to identify distinct cell groups that have different gene expression patterns. Currently there are limited bioinformatics approaches available for single-cell RNA-seq analysis. Many studies rely on principal component analysis (PCA) with arbitrary parameters to identify the genes that will be used to cluster the single cells. Results We have developed a novel algorithm, called SAIC (Single cell Analysis via Iterative Clustering), that identifies the optimal set of signature genes to separate single cells into distinct groups. Our method utilizes an iterative clustering approach to perform an exhaustive search for the best parameters within the search space, which is defined by a number of initial centers and P values. The end point is identification of a signature gene set that gives the best separation of the cell clusters. Using a simulated data set, we showed that SAIC can successfully identify the pre-defined signature gene sets that can correctly separated the cells into predefined clusters. We applied SAIC to two published single cell RNA-seq datasets. For both datasets, SAIC was able to identify a subset of signature genes that can cluster the single cells into groups that are consistent with the published results. The signature genes identified by SAIC resulted in better clusters of cells based on DB index score, and many genes also showed tissue specific expression. Conclusions In summary, we have developed an efficient algorithm to identify the optimal subset of genes that separate single cells into distinct clusters based on their expression patterns. We have shown that it performs better than PCA method using published single cell RNA-seq datasets.http://link.springer.com/article/10.1186/s12864-017-4019-5Single cellRNA-seqClusteringK-meansANOVAPCA
spellingShingle Lu Yang
Jiancheng Liu
Qiang Lu
Arthur D. Riggs
Xiwei Wu
SAIC: an iterative clustering approach for analysis of single cell RNA-seq data
BMC Genomics
Single cell
RNA-seq
Clustering
K-means
ANOVA
PCA
title SAIC: an iterative clustering approach for analysis of single cell RNA-seq data
title_full SAIC: an iterative clustering approach for analysis of single cell RNA-seq data
title_fullStr SAIC: an iterative clustering approach for analysis of single cell RNA-seq data
title_full_unstemmed SAIC: an iterative clustering approach for analysis of single cell RNA-seq data
title_short SAIC: an iterative clustering approach for analysis of single cell RNA-seq data
title_sort saic an iterative clustering approach for analysis of single cell rna seq data
topic Single cell
RNA-seq
Clustering
K-means
ANOVA
PCA
url http://link.springer.com/article/10.1186/s12864-017-4019-5
work_keys_str_mv AT luyang saicaniterativeclusteringapproachforanalysisofsinglecellrnaseqdata
AT jianchengliu saicaniterativeclusteringapproachforanalysisofsinglecellrnaseqdata
AT qianglu saicaniterativeclusteringapproachforanalysisofsinglecellrnaseqdata
AT arthurdriggs saicaniterativeclusteringapproachforanalysisofsinglecellrnaseqdata
AT xiweiwu saicaniterativeclusteringapproachforanalysisofsinglecellrnaseqdata