Evaluation of methods to assign cell type labels to cell clusters from single-cell RNA-sequencing data [version 3; peer review: 2 approved, 1 approved with reservations]

Background: Identification of cell type subpopulations from complex cell mixtures using single-cell RNA-sequencing (scRNA-seq) data includes automated steps from normalization to cell clustering. However, assigning cell type labels to cell clusters is often conducted manually, resulting in limited d...

Full description

Bibliographic Details
Main Authors: J. Javier Diaz-Mejia, Elaine C. Meng, Alexander R. Pico, Sonya A. MacParland, Troy Ketela, Trevor J. Pugh, Gary D. Bader, John H. Morris
Format: Article
Language:English
Published: F1000 Research Ltd 2019-10-01
Series:F1000Research
Online Access:https://f1000research.com/articles/8-296/v3
_version_ 1819265834592239616
author J. Javier Diaz-Mejia
Elaine C. Meng
Alexander R. Pico
Sonya A. MacParland
Troy Ketela
Trevor J. Pugh
Gary D. Bader
John H. Morris
author_facet J. Javier Diaz-Mejia
Elaine C. Meng
Alexander R. Pico
Sonya A. MacParland
Troy Ketela
Trevor J. Pugh
Gary D. Bader
John H. Morris
author_sort J. Javier Diaz-Mejia
collection DOAJ
description Background: Identification of cell type subpopulations from complex cell mixtures using single-cell RNA-sequencing (scRNA-seq) data includes automated steps from normalization to cell clustering. However, assigning cell type labels to cell clusters is often conducted manually, resulting in limited documentation, low reproducibility and uncontrolled vocabularies. This is partially due to the scarcity of reference cell type signatures and because some methods support limited cell type signatures. Methods: In this study, we benchmarked five methods representing first-generation enrichment analysis (ORA), second-generation approaches (GSEA and GSVA), machine learning tools (CIBERSORT) and network-based neighbor voting (METANEIGHBOR), for the task of assigning cell type labels to cell clusters from scRNA-seq data. We used five scRNA-seq datasets: human liver, 11 Tabula Muris mouse tissues, two human peripheral blood mononuclear cell datasets, and mouse retinal neurons, for which reference cell type signatures were available. The datasets span Drop-seq, 10X Chromium and Seq-Well technologies and range in size from ~3,700 to ~68,000 cells. Results: Our results show that, in general, all five methods perform well in the task as evaluated by receiver operating characteristic curve analysis (average area under the curve (AUC) = 0.91, sd = 0.06), whereas precision-recall analyses show a wide variation depending on the method and dataset (average AUC = 0.53, sd = 0.24). We observed an influence of the number of genes in cell type signatures on performance, with smaller signatures leading more frequently to incorrect results. Conclusions: GSVA was the overall top performer and was more robust in cell type signature subsampling simulations, although different methods performed well using different datasets. METANEIGHBOR and GSVA were the fastest methods. CIBERSORT and METANEIGHBOR were more influenced than the other methods by analyses including only expected cell types. We provide an extensible framework that can be used to evaluate other methods and datasets at https://github.com/jdime/scRNAseq_cell_cluster_labeling.
first_indexed 2024-12-23T20:51:41Z
format Article
id doaj.art-25cf40df8e1541cc95186df61d5529c0
institution Directory Open Access Journal
issn 2046-1402
language English
last_indexed 2024-12-23T20:51:41Z
publishDate 2019-10-01
publisher F1000 Research Ltd
record_format Article
series F1000Research
spelling doaj.art-25cf40df8e1541cc95186df61d5529c02022-12-21T17:31:38ZengF1000 Research LtdF1000Research2046-14022019-10-01810.12688/f1000research.18490.322823Evaluation of methods to assign cell type labels to cell clusters from single-cell RNA-sequencing data [version 3; peer review: 2 approved, 1 approved with reservations]J. Javier Diaz-Mejia0Elaine C. Meng1Alexander R. Pico2Sonya A. MacParland3Troy Ketela4Trevor J. Pugh5Gary D. Bader6John H. Morris7Princess Margaret Cancer Centre, University Health Network, Toronto, ON, M5G 2M9, CanadaDepartment of Pharmaceutical Chemistry, University of California San Francisco, San Francisco, CA, 94143, USAGladstone Institutes, San Francisco, CA, 94158, USAMulti-Organ Transplant Program, Toronto General Hospital Research Institute, Toronto, ON, M5G 2C4, CanadaPrincess Margaret Cancer Centre, University Health Network, Toronto, ON, M5G 2M9, CanadaPrincess Margaret Cancer Centre, University Health Network, Toronto, ON, M5G 2M9, CanadaThe Donnelly Centre, University of Toronto, Toronto, ON, M5S 3E1, CanadaDepartment of Pharmaceutical Chemistry, University of California San Francisco, San Francisco, CA, 94143, USABackground: Identification of cell type subpopulations from complex cell mixtures using single-cell RNA-sequencing (scRNA-seq) data includes automated steps from normalization to cell clustering. However, assigning cell type labels to cell clusters is often conducted manually, resulting in limited documentation, low reproducibility and uncontrolled vocabularies. This is partially due to the scarcity of reference cell type signatures and because some methods support limited cell type signatures. Methods: In this study, we benchmarked five methods representing first-generation enrichment analysis (ORA), second-generation approaches (GSEA and GSVA), machine learning tools (CIBERSORT) and network-based neighbor voting (METANEIGHBOR), for the task of assigning cell type labels to cell clusters from scRNA-seq data. We used five scRNA-seq datasets: human liver, 11 Tabula Muris mouse tissues, two human peripheral blood mononuclear cell datasets, and mouse retinal neurons, for which reference cell type signatures were available. The datasets span Drop-seq, 10X Chromium and Seq-Well technologies and range in size from ~3,700 to ~68,000 cells. Results: Our results show that, in general, all five methods perform well in the task as evaluated by receiver operating characteristic curve analysis (average area under the curve (AUC) = 0.91, sd = 0.06), whereas precision-recall analyses show a wide variation depending on the method and dataset (average AUC = 0.53, sd = 0.24). We observed an influence of the number of genes in cell type signatures on performance, with smaller signatures leading more frequently to incorrect results. Conclusions: GSVA was the overall top performer and was more robust in cell type signature subsampling simulations, although different methods performed well using different datasets. METANEIGHBOR and GSVA were the fastest methods. CIBERSORT and METANEIGHBOR were more influenced than the other methods by analyses including only expected cell types. We provide an extensible framework that can be used to evaluate other methods and datasets at https://github.com/jdime/scRNAseq_cell_cluster_labeling.https://f1000research.com/articles/8-296/v3
spellingShingle J. Javier Diaz-Mejia
Elaine C. Meng
Alexander R. Pico
Sonya A. MacParland
Troy Ketela
Trevor J. Pugh
Gary D. Bader
John H. Morris
Evaluation of methods to assign cell type labels to cell clusters from single-cell RNA-sequencing data [version 3; peer review: 2 approved, 1 approved with reservations]
F1000Research
title Evaluation of methods to assign cell type labels to cell clusters from single-cell RNA-sequencing data [version 3; peer review: 2 approved, 1 approved with reservations]
title_full Evaluation of methods to assign cell type labels to cell clusters from single-cell RNA-sequencing data [version 3; peer review: 2 approved, 1 approved with reservations]
title_fullStr Evaluation of methods to assign cell type labels to cell clusters from single-cell RNA-sequencing data [version 3; peer review: 2 approved, 1 approved with reservations]
title_full_unstemmed Evaluation of methods to assign cell type labels to cell clusters from single-cell RNA-sequencing data [version 3; peer review: 2 approved, 1 approved with reservations]
title_short Evaluation of methods to assign cell type labels to cell clusters from single-cell RNA-sequencing data [version 3; peer review: 2 approved, 1 approved with reservations]
title_sort evaluation of methods to assign cell type labels to cell clusters from single cell rna sequencing data version 3 peer review 2 approved 1 approved with reservations
url https://f1000research.com/articles/8-296/v3
work_keys_str_mv AT jjavierdiazmejia evaluationofmethodstoassigncelltypelabelstocellclustersfromsinglecellrnasequencingdataversion3peerreview2approved1approvedwithreservations
AT elainecmeng evaluationofmethodstoassigncelltypelabelstocellclustersfromsinglecellrnasequencingdataversion3peerreview2approved1approvedwithreservations
AT alexanderrpico evaluationofmethodstoassigncelltypelabelstocellclustersfromsinglecellrnasequencingdataversion3peerreview2approved1approvedwithreservations
AT sonyaamacparland evaluationofmethodstoassigncelltypelabelstocellclustersfromsinglecellrnasequencingdataversion3peerreview2approved1approvedwithreservations
AT troyketela evaluationofmethodstoassigncelltypelabelstocellclustersfromsinglecellrnasequencingdataversion3peerreview2approved1approvedwithreservations
AT trevorjpugh evaluationofmethodstoassigncelltypelabelstocellclustersfromsinglecellrnasequencingdataversion3peerreview2approved1approvedwithreservations
AT garydbader evaluationofmethodstoassigncelltypelabelstocellclustersfromsinglecellrnasequencingdataversion3peerreview2approved1approvedwithreservations
AT johnhmorris evaluationofmethodstoassigncelltypelabelstocellclustersfromsinglecellrnasequencingdataversion3peerreview2approved1approvedwithreservations