Multi-view Subspace Clustering Analysis for Aggregating Multiple Heterogeneous Omics Data

Integration of distinct biological data types could provide a comprehensive view of biological processes or complex diseases. The combinations of molecules responsible for different phenotypes form multiple embedded (expression) subspaces, thus identifying the intrinsic data structure is challenging...

Full description

Bibliographic Details
Main Authors:	Qianqian Shi, Bing Hu, Tao Zeng, Chuanchao Zhang
Format:	Article
Language:	English
Published:	Frontiers Media S.A. 2019-08-01
Series:	Frontiers in Genetics
Subjects:	multi-view subspace clustering analysis data integration heterogeneity low-rank representation graph diffusion
Online Access:	https://www.frontiersin.org/article/10.3389/fgene.2019.00744/full

_version_	1811225267886096384
author	Qianqian Shi Bing Hu Tao Zeng Tao Zeng Chuanchao Zhang
author_facet	Qianqian Shi Bing Hu Tao Zeng Tao Zeng Chuanchao Zhang
author_sort	Qianqian Shi
collection	DOAJ
description	Integration of distinct biological data types could provide a comprehensive view of biological processes or complex diseases. The combinations of molecules responsible for different phenotypes form multiple embedded (expression) subspaces, thus identifying the intrinsic data structure is challenging by regular integration methods. In this paper, we propose a novel framework of “Multi-view Subspace Clustering Analysis (MSCA),” which could measure the local similarities of samples in the same subspace and obtain the global consensus sample patterns (structures) for multiple data types, thereby comprehensively capturing the underlying heterogeneity of samples. Applied to various synthetic datasets, MSCA performs effectively to recognize the predefined sample patterns, and is robust to data noises. Given a real biological dataset, i.e., Cancer Cell Line Encyclopedia (CCLE) data, MSCA successfully identifies cell clusters of common aberrations across cancer types. A remarkable superiority over the state-of-the-art methods, such as iClusterPlus, SNF, and ANF, has also been demonstrated in our simulation and case studies.
first_indexed	2024-04-12T09:04:12Z
format	Article
id	doaj.art-d1f9109330e04eee9eb0dc7c7ecb73af
institution	Directory Open Access Journal
issn	1664-8021
language	English
last_indexed	2024-04-12T09:04:12Z
publishDate	2019-08-01
publisher	Frontiers Media S.A.
record_format	Article
series	Frontiers in Genetics
spelling	doaj.art-d1f9109330e04eee9eb0dc7c7ecb73af2022-12-22T03:39:09ZengFrontiers Media S.A.Frontiers in Genetics1664-80212019-08-011010.3389/fgene.2019.00744444803Multi-view Subspace Clustering Analysis for Aggregating Multiple Heterogeneous Omics DataQianqian Shi0Bing Hu1Tao Zeng2Tao Zeng3Chuanchao Zhang4Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, ChinaDepartment of Applied Mathematics, College of Science, Zhejiang University of Technology, Hangzhou, ChinaKey Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Shanghai Institute of Biological Sciences, Chinese Academy of Sciences, Shanghai, ChinaShanghai Research Center for Brain Science and Brain-Inspired Intelligence, Shanghai, ChinaWuhan Institute of Huawei Technologies, Wuhan, ChinaIntegration of distinct biological data types could provide a comprehensive view of biological processes or complex diseases. The combinations of molecules responsible for different phenotypes form multiple embedded (expression) subspaces, thus identifying the intrinsic data structure is challenging by regular integration methods. In this paper, we propose a novel framework of “Multi-view Subspace Clustering Analysis (MSCA),” which could measure the local similarities of samples in the same subspace and obtain the global consensus sample patterns (structures) for multiple data types, thereby comprehensively capturing the underlying heterogeneity of samples. Applied to various synthetic datasets, MSCA performs effectively to recognize the predefined sample patterns, and is robust to data noises. Given a real biological dataset, i.e., Cancer Cell Line Encyclopedia (CCLE) data, MSCA successfully identifies cell clusters of common aberrations across cancer types. A remarkable superiority over the state-of-the-art methods, such as iClusterPlus, SNF, and ANF, has also been demonstrated in our simulation and case studies.https://www.frontiersin.org/article/10.3389/fgene.2019.00744/fullmulti-view subspace clustering analysisdata integrationheterogeneitylow-rank representationgraph diffusion
spellingShingle	Qianqian Shi Bing Hu Tao Zeng Tao Zeng Chuanchao Zhang Multi-view Subspace Clustering Analysis for Aggregating Multiple Heterogeneous Omics Data Frontiers in Genetics multi-view subspace clustering analysis data integration heterogeneity low-rank representation graph diffusion
title	Multi-view Subspace Clustering Analysis for Aggregating Multiple Heterogeneous Omics Data
title_full	Multi-view Subspace Clustering Analysis for Aggregating Multiple Heterogeneous Omics Data
title_fullStr	Multi-view Subspace Clustering Analysis for Aggregating Multiple Heterogeneous Omics Data
title_full_unstemmed	Multi-view Subspace Clustering Analysis for Aggregating Multiple Heterogeneous Omics Data
title_short	Multi-view Subspace Clustering Analysis for Aggregating Multiple Heterogeneous Omics Data
title_sort	multi view subspace clustering analysis for aggregating multiple heterogeneous omics data
topic	multi-view subspace clustering analysis data integration heterogeneity low-rank representation graph diffusion
url	https://www.frontiersin.org/article/10.3389/fgene.2019.00744/full
work_keys_str_mv	AT qianqianshi multiviewsubspaceclusteringanalysisforaggregatingmultipleheterogeneousomicsdata AT binghu multiviewsubspaceclusteringanalysisforaggregatingmultipleheterogeneousomicsdata AT taozeng multiviewsubspaceclusteringanalysisforaggregatingmultipleheterogeneousomicsdata AT taozeng multiviewsubspaceclusteringanalysisforaggregatingmultipleheterogeneousomicsdata AT chuanchaozhang multiviewsubspaceclusteringanalysisforaggregatingmultipleheterogeneousomicsdata

Multi-view Subspace Clustering Analysis for Aggregating Multiple Heterogeneous Omics Data

Similar Items