Dimensionality reduction by UMAP reinforces sample heterogeneity analysis in bulk transcriptomic data

Summary: Transcriptomic analysis plays a key role in biomedical research. Linear dimensionality reduction methods, especially principal-component analysis (PCA), are widely used in detecting sample-to-sample heterogeneity, while recently developed non-linear methods, such as t-distributed stochastic...

Full description

Bibliographic Details
Main Authors: Yang Yang, Hongjian Sun, Yu Zhang, Tiefu Zhang, Jialei Gong, Yunbo Wei, Yong-Gang Duan, Minglei Shu, Yuchen Yang, Di Wu, Di Yu
Format: Article
Language:English
Published: Elsevier 2021-07-01
Series:Cell Reports
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2211124721008597
_version_ 1819131666740805632
author Yang Yang
Hongjian Sun
Yu Zhang
Tiefu Zhang
Jialei Gong
Yunbo Wei
Yong-Gang Duan
Minglei Shu
Yuchen Yang
Di Wu
Di Yu
author_facet Yang Yang
Hongjian Sun
Yu Zhang
Tiefu Zhang
Jialei Gong
Yunbo Wei
Yong-Gang Duan
Minglei Shu
Yuchen Yang
Di Wu
Di Yu
author_sort Yang Yang
collection DOAJ
description Summary: Transcriptomic analysis plays a key role in biomedical research. Linear dimensionality reduction methods, especially principal-component analysis (PCA), are widely used in detecting sample-to-sample heterogeneity, while recently developed non-linear methods, such as t-distributed stochastic neighbor embedding (t-SNE) and uniform manifold approximation and projection (UMAP), can efficiently cluster heterogeneous samples in single-cell RNA sequencing analysis. Yet, the application of t-SNE and UMAP in bulk transcriptomic analysis and comparison with conventional methods have not been achieved. We compare four major dimensionality reduction methods (PCA, multidimensional scaling [MDS], t-SNE, and UMAP) in analyzing 71 large bulk transcriptomic datasets. UMAP is superior to PCA and MDS but shows some advantages over t-SNE in differentiating batch effects, identifying pre-defined biological groups, and revealing in-depth clusters in two-dimensional space. Importantly, UMAP generates sample clusters uncovering biological features and clinical meaning. We recommend deploying UMAP in visualizing and analyzing sizable bulk transcriptomic datasets to reinforce sample heterogeneity analysis.
first_indexed 2024-12-22T09:19:08Z
format Article
id doaj.art-260c025e813f47bfa2ae65a54b9ed7af
institution Directory Open Access Journal
issn 2211-1247
language English
last_indexed 2024-12-22T09:19:08Z
publishDate 2021-07-01
publisher Elsevier
record_format Article
series Cell Reports
spelling doaj.art-260c025e813f47bfa2ae65a54b9ed7af2022-12-21T18:31:14ZengElsevierCell Reports2211-12472021-07-01364109442Dimensionality reduction by UMAP reinforces sample heterogeneity analysis in bulk transcriptomic dataYang Yang0Hongjian Sun1Yu Zhang2Tiefu Zhang3Jialei Gong4Yunbo Wei5Yong-Gang Duan6Minglei Shu7Yuchen Yang8Di Wu9Di Yu10The University of Queensland Diamantina Institute, Faculty of Medicine, The University of Queensland, Translational Research Institute, Brisbane, QLD, Australia; Shandong Artificial Intelligence Institute, Qilu University of Technology (Shandong Academy of Sciences), Jinan, ChinaShandong Artificial Intelligence Institute, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China; School of Microelectronics, Shandong University, Jinan, ChinaLaboratory of Immunology for Environment and Health, School of Pharmaceutical Sciences, Qilu University of Technology (Shandong Academy of Sciences), Jinan, ChinaUniversity of Electronic Science and Technology of China, Chengdu, ChinaShenzhen Key Laboratory of Fertility Regulation, Center of Assisted Reproduction and Embryology, University of Hong Kong, Shenzhen Hospital, Shenzhen, ChinaLaboratory of Immunology for Environment and Health, School of Pharmaceutical Sciences, Qilu University of Technology (Shandong Academy of Sciences), Jinan, ChinaShenzhen Key Laboratory of Fertility Regulation, Center of Assisted Reproduction and Embryology, University of Hong Kong, Shenzhen Hospital, Shenzhen, ChinaShandong Artificial Intelligence Institute, Qilu University of Technology (Shandong Academy of Sciences), Jinan, ChinaDepartment of Pathology and Laboratory Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA; McAllister Heart Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC, USADepartment of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA; Division of Oral and Craniofacial Health Science, Adams School of Dentistry, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA; Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC, USA; Corresponding authorThe University of Queensland Diamantina Institute, Faculty of Medicine, The University of Queensland, Translational Research Institute, Brisbane, QLD, Australia; Shandong Artificial Intelligence Institute, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China; Laboratory of Immunology for Environment and Health, School of Pharmaceutical Sciences, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China; Corresponding authorSummary: Transcriptomic analysis plays a key role in biomedical research. Linear dimensionality reduction methods, especially principal-component analysis (PCA), are widely used in detecting sample-to-sample heterogeneity, while recently developed non-linear methods, such as t-distributed stochastic neighbor embedding (t-SNE) and uniform manifold approximation and projection (UMAP), can efficiently cluster heterogeneous samples in single-cell RNA sequencing analysis. Yet, the application of t-SNE and UMAP in bulk transcriptomic analysis and comparison with conventional methods have not been achieved. We compare four major dimensionality reduction methods (PCA, multidimensional scaling [MDS], t-SNE, and UMAP) in analyzing 71 large bulk transcriptomic datasets. UMAP is superior to PCA and MDS but shows some advantages over t-SNE in differentiating batch effects, identifying pre-defined biological groups, and revealing in-depth clusters in two-dimensional space. Importantly, UMAP generates sample clusters uncovering biological features and clinical meaning. We recommend deploying UMAP in visualizing and analyzing sizable bulk transcriptomic datasets to reinforce sample heterogeneity analysis.http://www.sciencedirect.com/science/article/pii/S2211124721008597bulk transcriptomicsdimensionality reductionUMAPt-SNEPCAclustering structure
spellingShingle Yang Yang
Hongjian Sun
Yu Zhang
Tiefu Zhang
Jialei Gong
Yunbo Wei
Yong-Gang Duan
Minglei Shu
Yuchen Yang
Di Wu
Di Yu
Dimensionality reduction by UMAP reinforces sample heterogeneity analysis in bulk transcriptomic data
Cell Reports
bulk transcriptomics
dimensionality reduction
UMAP
t-SNE
PCA
clustering structure
title Dimensionality reduction by UMAP reinforces sample heterogeneity analysis in bulk transcriptomic data
title_full Dimensionality reduction by UMAP reinforces sample heterogeneity analysis in bulk transcriptomic data
title_fullStr Dimensionality reduction by UMAP reinforces sample heterogeneity analysis in bulk transcriptomic data
title_full_unstemmed Dimensionality reduction by UMAP reinforces sample heterogeneity analysis in bulk transcriptomic data
title_short Dimensionality reduction by UMAP reinforces sample heterogeneity analysis in bulk transcriptomic data
title_sort dimensionality reduction by umap reinforces sample heterogeneity analysis in bulk transcriptomic data
topic bulk transcriptomics
dimensionality reduction
UMAP
t-SNE
PCA
clustering structure
url http://www.sciencedirect.com/science/article/pii/S2211124721008597
work_keys_str_mv AT yangyang dimensionalityreductionbyumapreinforcessampleheterogeneityanalysisinbulktranscriptomicdata
AT hongjiansun dimensionalityreductionbyumapreinforcessampleheterogeneityanalysisinbulktranscriptomicdata
AT yuzhang dimensionalityreductionbyumapreinforcessampleheterogeneityanalysisinbulktranscriptomicdata
AT tiefuzhang dimensionalityreductionbyumapreinforcessampleheterogeneityanalysisinbulktranscriptomicdata
AT jialeigong dimensionalityreductionbyumapreinforcessampleheterogeneityanalysisinbulktranscriptomicdata
AT yunbowei dimensionalityreductionbyumapreinforcessampleheterogeneityanalysisinbulktranscriptomicdata
AT yonggangduan dimensionalityreductionbyumapreinforcessampleheterogeneityanalysisinbulktranscriptomicdata
AT mingleishu dimensionalityreductionbyumapreinforcessampleheterogeneityanalysisinbulktranscriptomicdata
AT yuchenyang dimensionalityreductionbyumapreinforcessampleheterogeneityanalysisinbulktranscriptomicdata
AT diwu dimensionalityreductionbyumapreinforcessampleheterogeneityanalysisinbulktranscriptomicdata
AT diyu dimensionalityreductionbyumapreinforcessampleheterogeneityanalysisinbulktranscriptomicdata