Label-aware distance mitigates temporal and spatial variability for clustering and visualization of single-cell gene expression data
Abstract Clustering and visualization are essential parts of single-cell gene expression data analysis. The Euclidean distance used in most distance-based methods is not optimal. The batch effect, i.e., the variability among samples gathered from different times, tissues, and patients, introduces la...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2024-03-01
|
Series: | Communications Biology |
Online Access: | https://doi.org/10.1038/s42003-024-05988-y |
_version_ | 1827315831529275392 |
---|---|
author | Shaoheng Liang Jinzhuang Dou Ramiz Iqbal Ken Chen |
author_facet | Shaoheng Liang Jinzhuang Dou Ramiz Iqbal Ken Chen |
author_sort | Shaoheng Liang |
collection | DOAJ |
description | Abstract Clustering and visualization are essential parts of single-cell gene expression data analysis. The Euclidean distance used in most distance-based methods is not optimal. The batch effect, i.e., the variability among samples gathered from different times, tissues, and patients, introduces large between-group distance and obscures the true identities of cells. To solve this problem, we introduce Label-Aware Distance (Lad), a metric using temporal/spatial locality of the batch effect to control for such factors. We validate Lad on simulated data as well as apply it to a mouse retina development dataset and a lung dataset. We also found the utility of our approach in understanding the progression of the Coronavirus Disease 2019 (COVID-19). Lad provides better cell embedding than state-of-the-art batch correction methods on longitudinal datasets. It can be used in distance-based clustering and visualization methods to combine the power of multiple samples to help make biological findings. |
first_indexed | 2024-04-24T23:03:38Z |
format | Article |
id | doaj.art-622967bd5211402ea382aaa6e8f828d1 |
institution | Directory Open Access Journal |
issn | 2399-3642 |
language | English |
last_indexed | 2024-04-24T23:03:38Z |
publishDate | 2024-03-01 |
publisher | Nature Portfolio |
record_format | Article |
series | Communications Biology |
spelling | doaj.art-622967bd5211402ea382aaa6e8f828d12024-03-17T12:35:26ZengNature PortfolioCommunications Biology2399-36422024-03-01711810.1038/s42003-024-05988-yLabel-aware distance mitigates temporal and spatial variability for clustering and visualization of single-cell gene expression dataShaoheng Liang0Jinzhuang Dou1Ramiz Iqbal2Ken Chen3Department of Bioinformatics and Computational BiologyDepartment of Bioinformatics and Computational BiologyDepartment of Bioinformatics and Computational BiologyDepartment of Bioinformatics and Computational BiologyAbstract Clustering and visualization are essential parts of single-cell gene expression data analysis. The Euclidean distance used in most distance-based methods is not optimal. The batch effect, i.e., the variability among samples gathered from different times, tissues, and patients, introduces large between-group distance and obscures the true identities of cells. To solve this problem, we introduce Label-Aware Distance (Lad), a metric using temporal/spatial locality of the batch effect to control for such factors. We validate Lad on simulated data as well as apply it to a mouse retina development dataset and a lung dataset. We also found the utility of our approach in understanding the progression of the Coronavirus Disease 2019 (COVID-19). Lad provides better cell embedding than state-of-the-art batch correction methods on longitudinal datasets. It can be used in distance-based clustering and visualization methods to combine the power of multiple samples to help make biological findings.https://doi.org/10.1038/s42003-024-05988-y |
spellingShingle | Shaoheng Liang Jinzhuang Dou Ramiz Iqbal Ken Chen Label-aware distance mitigates temporal and spatial variability for clustering and visualization of single-cell gene expression data Communications Biology |
title | Label-aware distance mitigates temporal and spatial variability for clustering and visualization of single-cell gene expression data |
title_full | Label-aware distance mitigates temporal and spatial variability for clustering and visualization of single-cell gene expression data |
title_fullStr | Label-aware distance mitigates temporal and spatial variability for clustering and visualization of single-cell gene expression data |
title_full_unstemmed | Label-aware distance mitigates temporal and spatial variability for clustering and visualization of single-cell gene expression data |
title_short | Label-aware distance mitigates temporal and spatial variability for clustering and visualization of single-cell gene expression data |
title_sort | label aware distance mitigates temporal and spatial variability for clustering and visualization of single cell gene expression data |
url | https://doi.org/10.1038/s42003-024-05988-y |
work_keys_str_mv | AT shaohengliang labelawaredistancemitigatestemporalandspatialvariabilityforclusteringandvisualizationofsinglecellgeneexpressiondata AT jinzhuangdou labelawaredistancemitigatestemporalandspatialvariabilityforclusteringandvisualizationofsinglecellgeneexpressiondata AT ramiziqbal labelawaredistancemitigatestemporalandspatialvariabilityforclusteringandvisualizationofsinglecellgeneexpressiondata AT kenchen labelawaredistancemitigatestemporalandspatialvariabilityforclusteringandvisualizationofsinglecellgeneexpressiondata |