Determining clinically relevant features in cytometry data using persistent homology.

Cytometry experiments yield high-dimensional point cloud data that is difficult to interpret manually. Boolean gating techniques coupled with comparisons of relative abundances of cellular subsets is the current standard for cytometry data analysis. However, this approach is unable to capture more s...

Full description

Bibliographic Details
Main Authors:	Soham Mukherjee, Darren Wethington, Tamal K Dey, Jayajit Das
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2022-03-01
Series:	PLoS Computational Biology
Online Access:	https://doi.org/10.1371/journal.pcbi.1009931

_version_	1797984427490410496
author	Soham Mukherjee Darren Wethington Tamal K Dey Jayajit Das
author_facet	Soham Mukherjee Darren Wethington Tamal K Dey Jayajit Das
author_sort	Soham Mukherjee
collection	DOAJ
description	Cytometry experiments yield high-dimensional point cloud data that is difficult to interpret manually. Boolean gating techniques coupled with comparisons of relative abundances of cellular subsets is the current standard for cytometry data analysis. However, this approach is unable to capture more subtle topological features hidden in data, especially if those features are further masked by data transforms or significant batch effects or donor-to-donor variations in clinical data. We present that persistent homology, a mathematical structure that summarizes the topological features, can distinguish different sources of data, such as from groups of healthy donors or patients, effectively. Analysis of publicly available cytometry data describing non-naïve CD8+ T cells in COVID-19 patients and healthy controls shows that systematic structural differences exist between single cell protein expressions in COVID-19 patients and healthy controls. We identify proteins of interest by a decision-tree based classifier, sample points randomly and compute persistence diagrams from these sampled points. The resulting persistence diagrams identify regions in cytometry datasets of varying density and identify protruded structures such as 'elbows'. We compute Wasserstein distances between these persistence diagrams for random pairs of healthy controls and COVID-19 patients and find that systematic structural differences exist between COVID-19 patients and healthy controls in the expression data for T-bet, Eomes, and Ki-67. Further analysis shows that expression of T-bet and Eomes are significantly downregulated in COVID-19 patient non-naïve CD8+ T cells compared to healthy controls. This counter-intuitive finding may indicate that canonical effector CD8+ T cells are less prevalent in COVID-19 patients than healthy controls. This method is applicable to any cytometry dataset for discovering novel insights through topological data analysis which may be difficult to ascertain otherwise with a standard gating strategy or existing bioinformatic tools.
first_indexed	2024-04-11T07:02:40Z
format	Article
id	doaj.art-a8882e979b7d4715acc9a9a6eec8a863
institution	Directory Open Access Journal
issn	1553-734X 1553-7358
language	English
last_indexed	2024-04-11T07:02:40Z
publishDate	2022-03-01
publisher	Public Library of Science (PLoS)
record_format	Article
series	PLoS Computational Biology
spelling	doaj.art-a8882e979b7d4715acc9a9a6eec8a8632022-12-22T04:38:38ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582022-03-01183e100993110.1371/journal.pcbi.1009931Determining clinically relevant features in cytometry data using persistent homology.Soham MukherjeeDarren WethingtonTamal K DeyJayajit DasCytometry experiments yield high-dimensional point cloud data that is difficult to interpret manually. Boolean gating techniques coupled with comparisons of relative abundances of cellular subsets is the current standard for cytometry data analysis. However, this approach is unable to capture more subtle topological features hidden in data, especially if those features are further masked by data transforms or significant batch effects or donor-to-donor variations in clinical data. We present that persistent homology, a mathematical structure that summarizes the topological features, can distinguish different sources of data, such as from groups of healthy donors or patients, effectively. Analysis of publicly available cytometry data describing non-naïve CD8+ T cells in COVID-19 patients and healthy controls shows that systematic structural differences exist between single cell protein expressions in COVID-19 patients and healthy controls. We identify proteins of interest by a decision-tree based classifier, sample points randomly and compute persistence diagrams from these sampled points. The resulting persistence diagrams identify regions in cytometry datasets of varying density and identify protruded structures such as 'elbows'. We compute Wasserstein distances between these persistence diagrams for random pairs of healthy controls and COVID-19 patients and find that systematic structural differences exist between COVID-19 patients and healthy controls in the expression data for T-bet, Eomes, and Ki-67. Further analysis shows that expression of T-bet and Eomes are significantly downregulated in COVID-19 patient non-naïve CD8+ T cells compared to healthy controls. This counter-intuitive finding may indicate that canonical effector CD8+ T cells are less prevalent in COVID-19 patients than healthy controls. This method is applicable to any cytometry dataset for discovering novel insights through topological data analysis which may be difficult to ascertain otherwise with a standard gating strategy or existing bioinformatic tools.https://doi.org/10.1371/journal.pcbi.1009931
spellingShingle	Soham Mukherjee Darren Wethington Tamal K Dey Jayajit Das Determining clinically relevant features in cytometry data using persistent homology. PLoS Computational Biology
title	Determining clinically relevant features in cytometry data using persistent homology.
title_full	Determining clinically relevant features in cytometry data using persistent homology.
title_fullStr	Determining clinically relevant features in cytometry data using persistent homology.
title_full_unstemmed	Determining clinically relevant features in cytometry data using persistent homology.
title_short	Determining clinically relevant features in cytometry data using persistent homology.
title_sort	determining clinically relevant features in cytometry data using persistent homology
url	https://doi.org/10.1371/journal.pcbi.1009931
work_keys_str_mv	AT sohammukherjee determiningclinicallyrelevantfeaturesincytometrydatausingpersistenthomology AT darrenwethington determiningclinicallyrelevantfeaturesincytometrydatausingpersistenthomology AT tamalkdey determiningclinicallyrelevantfeaturesincytometrydatausingpersistenthomology AT jayajitdas determiningclinicallyrelevantfeaturesincytometrydatausingpersistenthomology

Determining clinically relevant features in cytometry data using persistent homology.

Similar Items