HAL-X: Scalable hierarchical clustering for rapid and tunable single-cell analysis.

Data clustering plays a significant role in biomedical sciences, particularly in single-cell data analysis. Researchers use clustering algorithms to group individual cells into populations that can be evaluated across different levels of disease progression, drug response, and other clinical statuse...

Full description

Bibliographic Details
Main Authors: James Anibal, Alexandre G Day, Erol Bahadiroglu, Liam O'Neil, Long Phan, Alec Peltekian, Amir Erez, Mariana Kaplan, Grégoire Altan-Bonnet, Pankaj Mehta
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2022-10-01
Series:PLoS Computational Biology
Online Access:https://doi.org/10.1371/journal.pcbi.1010349
_version_ 1828055059942866944
author James Anibal
Alexandre G Day
Erol Bahadiroglu
Liam O'Neil
Long Phan
Alec Peltekian
Amir Erez
Mariana Kaplan
Grégoire Altan-Bonnet
Pankaj Mehta
author_facet James Anibal
Alexandre G Day
Erol Bahadiroglu
Liam O'Neil
Long Phan
Alec Peltekian
Amir Erez
Mariana Kaplan
Grégoire Altan-Bonnet
Pankaj Mehta
author_sort James Anibal
collection DOAJ
description Data clustering plays a significant role in biomedical sciences, particularly in single-cell data analysis. Researchers use clustering algorithms to group individual cells into populations that can be evaluated across different levels of disease progression, drug response, and other clinical statuses. In many cases, multiple sets of clusters must be generated to assess varying levels of cluster specificity. For example, there are many subtypes of leukocytes (e.g. T cells), whose individual preponderance and phenotype must be assessed for statistical/functional significance. In this report, we introduce a novel hierarchical density clustering algorithm (HAL-x) that uses supervised linkage methods to build a cluster hierarchy on raw single-cell data. With this new approach, HAL-x can quickly predict multiple sets of labels for immense datasets, achieving a considerable improvement in computational efficiency on large datasets compared to existing methods. We also show that cell clusters generated by HAL-x yield near-perfect F1-scores when classifying different clinical statuses based on single-cell profiles. Our hierarchical density clustering algorithm achieves high accuracy in single cell classification in a scalable, tunable and rapid manner.
first_indexed 2024-04-10T20:32:38Z
format Article
id doaj.art-398139c400994a5c82fd652cce8b1d87
institution Directory Open Access Journal
issn 1553-734X
1553-7358
language English
last_indexed 2024-04-10T20:32:38Z
publishDate 2022-10-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS Computational Biology
spelling doaj.art-398139c400994a5c82fd652cce8b1d872023-01-25T05:31:58ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582022-10-011810e101034910.1371/journal.pcbi.1010349HAL-X: Scalable hierarchical clustering for rapid and tunable single-cell analysis.James AnibalAlexandre G DayErol BahadirogluLiam O'NeilLong PhanAlec PeltekianAmir ErezMariana KaplanGrégoire Altan-BonnetPankaj MehtaData clustering plays a significant role in biomedical sciences, particularly in single-cell data analysis. Researchers use clustering algorithms to group individual cells into populations that can be evaluated across different levels of disease progression, drug response, and other clinical statuses. In many cases, multiple sets of clusters must be generated to assess varying levels of cluster specificity. For example, there are many subtypes of leukocytes (e.g. T cells), whose individual preponderance and phenotype must be assessed for statistical/functional significance. In this report, we introduce a novel hierarchical density clustering algorithm (HAL-x) that uses supervised linkage methods to build a cluster hierarchy on raw single-cell data. With this new approach, HAL-x can quickly predict multiple sets of labels for immense datasets, achieving a considerable improvement in computational efficiency on large datasets compared to existing methods. We also show that cell clusters generated by HAL-x yield near-perfect F1-scores when classifying different clinical statuses based on single-cell profiles. Our hierarchical density clustering algorithm achieves high accuracy in single cell classification in a scalable, tunable and rapid manner.https://doi.org/10.1371/journal.pcbi.1010349
spellingShingle James Anibal
Alexandre G Day
Erol Bahadiroglu
Liam O'Neil
Long Phan
Alec Peltekian
Amir Erez
Mariana Kaplan
Grégoire Altan-Bonnet
Pankaj Mehta
HAL-X: Scalable hierarchical clustering for rapid and tunable single-cell analysis.
PLoS Computational Biology
title HAL-X: Scalable hierarchical clustering for rapid and tunable single-cell analysis.
title_full HAL-X: Scalable hierarchical clustering for rapid and tunable single-cell analysis.
title_fullStr HAL-X: Scalable hierarchical clustering for rapid and tunable single-cell analysis.
title_full_unstemmed HAL-X: Scalable hierarchical clustering for rapid and tunable single-cell analysis.
title_short HAL-X: Scalable hierarchical clustering for rapid and tunable single-cell analysis.
title_sort hal x scalable hierarchical clustering for rapid and tunable single cell analysis
url https://doi.org/10.1371/journal.pcbi.1010349
work_keys_str_mv AT jamesanibal halxscalablehierarchicalclusteringforrapidandtunablesinglecellanalysis
AT alexandregday halxscalablehierarchicalclusteringforrapidandtunablesinglecellanalysis
AT erolbahadiroglu halxscalablehierarchicalclusteringforrapidandtunablesinglecellanalysis
AT liamoneil halxscalablehierarchicalclusteringforrapidandtunablesinglecellanalysis
AT longphan halxscalablehierarchicalclusteringforrapidandtunablesinglecellanalysis
AT alecpeltekian halxscalablehierarchicalclusteringforrapidandtunablesinglecellanalysis
AT amirerez halxscalablehierarchicalclusteringforrapidandtunablesinglecellanalysis
AT marianakaplan halxscalablehierarchicalclusteringforrapidandtunablesinglecellanalysis
AT gregoirealtanbonnet halxscalablehierarchicalclusteringforrapidandtunablesinglecellanalysis
AT pankajmehta halxscalablehierarchicalclusteringforrapidandtunablesinglecellanalysis