Label-aware distance mitigates temporal and spatial variability for clustering and visualization of single-cell gene expression data

Abstract Clustering and visualization are essential parts of single-cell gene expression data analysis. The Euclidean distance used in most distance-based methods is not optimal. The batch effect, i.e., the variability among samples gathered from different times, tissues, and patients, introduces la...

Full description

Bibliographic Details
Main Authors: Shaoheng Liang, Jinzhuang Dou, Ramiz Iqbal, Ken Chen
Format: Article
Language:English
Published: Nature Portfolio 2024-03-01
Series:Communications Biology
Online Access:https://doi.org/10.1038/s42003-024-05988-y
_version_ 1827315831529275392
author Shaoheng Liang
Jinzhuang Dou
Ramiz Iqbal
Ken Chen
author_facet Shaoheng Liang
Jinzhuang Dou
Ramiz Iqbal
Ken Chen
author_sort Shaoheng Liang
collection DOAJ
description Abstract Clustering and visualization are essential parts of single-cell gene expression data analysis. The Euclidean distance used in most distance-based methods is not optimal. The batch effect, i.e., the variability among samples gathered from different times, tissues, and patients, introduces large between-group distance and obscures the true identities of cells. To solve this problem, we introduce Label-Aware Distance (Lad), a metric using temporal/spatial locality of the batch effect to control for such factors. We validate Lad on simulated data as well as apply it to a mouse retina development dataset and a lung dataset. We also found the utility of our approach in understanding the progression of the Coronavirus Disease 2019 (COVID-19). Lad provides better cell embedding than state-of-the-art batch correction methods on longitudinal datasets. It can be used in distance-based clustering and visualization methods to combine the power of multiple samples to help make biological findings.
first_indexed 2024-04-24T23:03:38Z
format Article
id doaj.art-622967bd5211402ea382aaa6e8f828d1
institution Directory Open Access Journal
issn 2399-3642
language English
last_indexed 2024-04-24T23:03:38Z
publishDate 2024-03-01
publisher Nature Portfolio
record_format Article
series Communications Biology
spelling doaj.art-622967bd5211402ea382aaa6e8f828d12024-03-17T12:35:26ZengNature PortfolioCommunications Biology2399-36422024-03-01711810.1038/s42003-024-05988-yLabel-aware distance mitigates temporal and spatial variability for clustering and visualization of single-cell gene expression dataShaoheng Liang0Jinzhuang Dou1Ramiz Iqbal2Ken Chen3Department of Bioinformatics and Computational BiologyDepartment of Bioinformatics and Computational BiologyDepartment of Bioinformatics and Computational BiologyDepartment of Bioinformatics and Computational BiologyAbstract Clustering and visualization are essential parts of single-cell gene expression data analysis. The Euclidean distance used in most distance-based methods is not optimal. The batch effect, i.e., the variability among samples gathered from different times, tissues, and patients, introduces large between-group distance and obscures the true identities of cells. To solve this problem, we introduce Label-Aware Distance (Lad), a metric using temporal/spatial locality of the batch effect to control for such factors. We validate Lad on simulated data as well as apply it to a mouse retina development dataset and a lung dataset. We also found the utility of our approach in understanding the progression of the Coronavirus Disease 2019 (COVID-19). Lad provides better cell embedding than state-of-the-art batch correction methods on longitudinal datasets. It can be used in distance-based clustering and visualization methods to combine the power of multiple samples to help make biological findings.https://doi.org/10.1038/s42003-024-05988-y
spellingShingle Shaoheng Liang
Jinzhuang Dou
Ramiz Iqbal
Ken Chen
Label-aware distance mitigates temporal and spatial variability for clustering and visualization of single-cell gene expression data
Communications Biology
title Label-aware distance mitigates temporal and spatial variability for clustering and visualization of single-cell gene expression data
title_full Label-aware distance mitigates temporal and spatial variability for clustering and visualization of single-cell gene expression data
title_fullStr Label-aware distance mitigates temporal and spatial variability for clustering and visualization of single-cell gene expression data
title_full_unstemmed Label-aware distance mitigates temporal and spatial variability for clustering and visualization of single-cell gene expression data
title_short Label-aware distance mitigates temporal and spatial variability for clustering and visualization of single-cell gene expression data
title_sort label aware distance mitigates temporal and spatial variability for clustering and visualization of single cell gene expression data
url https://doi.org/10.1038/s42003-024-05988-y
work_keys_str_mv AT shaohengliang labelawaredistancemitigatestemporalandspatialvariabilityforclusteringandvisualizationofsinglecellgeneexpressiondata
AT jinzhuangdou labelawaredistancemitigatestemporalandspatialvariabilityforclusteringandvisualizationofsinglecellgeneexpressiondata
AT ramiziqbal labelawaredistancemitigatestemporalandspatialvariabilityforclusteringandvisualizationofsinglecellgeneexpressiondata
AT kenchen labelawaredistancemitigatestemporalandspatialvariabilityforclusteringandvisualizationofsinglecellgeneexpressiondata