CircRNA-Disease Associations Prediction Based on Metapath2vec++ and Matrix Factorization

Circular RNA (circRNA) is a novel non-coding endogenous RNAs. Evidence has shown that circRNAs are related to many biological processes and play essential roles in different biological functions. Although increasing numbers of circRNAs are discovered using high-throughput sequencing technologies, th...

Full description

Bibliographic Details
Main Authors: Yuchen Zhang, Xiujuan Lei, Zengqiang Fang, Yi Pan
Format: Article
Language:English
Published: Tsinghua University Press 2020-12-01
Series:Big Data Mining and Analytics
Subjects:
Online Access:https://www.sciopen.com/article/10.26599/BDMA.2020.9020025
_version_ 1811344543232032768
author Yuchen Zhang
Xiujuan Lei
Zengqiang Fang
Yi Pan
author_facet Yuchen Zhang
Xiujuan Lei
Zengqiang Fang
Yi Pan
author_sort Yuchen Zhang
collection DOAJ
description Circular RNA (circRNA) is a novel non-coding endogenous RNAs. Evidence has shown that circRNAs are related to many biological processes and play essential roles in different biological functions. Although increasing numbers of circRNAs are discovered using high-throughput sequencing technologies, these techniques are still time-consuming and costly. In this study, we propose a computational method to predict circRNA-disesae associations which is based on metapath2vec++ and matrix factorization with integrated multiple data (called PCD_MVMF). To construct more reliable networks, various aspects are considered. Firstly, circRNA annotation, sequence, and functional similarity networks are established, and disease-related genes and semantics are adopted to construct disease functional and semantic similarity networks. Secondly, metapath2vec++ is applied on an integrated heterogeneous network to learn the embedded features and initial prediction score. Finally, we use matrix factorization, take similarity as a constraint, and optimize it to obtain the final prediction results. Leave-one-out cross-validation, five-fold cross-validation, and f-measure are adopted to evaluate the performance of PCD_MVMF. These evaluation metrics verify that PCD_MVMF has better prediction performance than other methods. To further illustrate the performance of PCD_MVMF, case studies of common diseases are conducted. Therefore, PCD_MVMF can be regarded as a reliable and useful circRNA-disease association prediction tool.
first_indexed 2024-04-13T19:48:49Z
format Article
id doaj.art-02fbea7cc3f3483e92516e51fd89c72e
institution Directory Open Access Journal
issn 2096-0654
language English
last_indexed 2024-04-13T19:48:49Z
publishDate 2020-12-01
publisher Tsinghua University Press
record_format Article
series Big Data Mining and Analytics
spelling doaj.art-02fbea7cc3f3483e92516e51fd89c72e2022-12-22T02:32:36ZengTsinghua University PressBig Data Mining and Analytics2096-06542020-12-013428029110.26599/BDMA.2020.9020025CircRNA-Disease Associations Prediction Based on Metapath2vec++ and Matrix FactorizationYuchen Zhang0Xiujuan Lei1Zengqiang Fang2Yi Pan3<institution>School of Computer Science, Shaanxi Normal University</institution>, <city>Xi’an</city> <postal-code>710119</postal-code>, <country>China</country><institution>School of Computer Science, Shaanxi Normal University</institution>, <city>Xi’an</city> <postal-code>710119</postal-code>, <country>China</country><institution>School of Computer Science, Shaanxi Normal University</institution>, <city>Xi’an</city> <postal-code>710119</postal-code>, <country>China</country><institution content-type="dept">Department of Computer Science</institution>, <institution>Georgia State University</institution>, <city>Atlanta</city>, <state>GA</state> <postal-code>30302</postal-code>, <country>USA</country>Circular RNA (circRNA) is a novel non-coding endogenous RNAs. Evidence has shown that circRNAs are related to many biological processes and play essential roles in different biological functions. Although increasing numbers of circRNAs are discovered using high-throughput sequencing technologies, these techniques are still time-consuming and costly. In this study, we propose a computational method to predict circRNA-disesae associations which is based on metapath2vec++ and matrix factorization with integrated multiple data (called PCD_MVMF). To construct more reliable networks, various aspects are considered. Firstly, circRNA annotation, sequence, and functional similarity networks are established, and disease-related genes and semantics are adopted to construct disease functional and semantic similarity networks. Secondly, metapath2vec++ is applied on an integrated heterogeneous network to learn the embedded features and initial prediction score. Finally, we use matrix factorization, take similarity as a constraint, and optimize it to obtain the final prediction results. Leave-one-out cross-validation, five-fold cross-validation, and f-measure are adopted to evaluate the performance of PCD_MVMF. These evaluation metrics verify that PCD_MVMF has better prediction performance than other methods. To further illustrate the performance of PCD_MVMF, case studies of common diseases are conducted. Therefore, PCD_MVMF can be regarded as a reliable and useful circRNA-disease association prediction tool.https://www.sciopen.com/article/10.26599/BDMA.2020.9020025circular rnas (circrnas)circrna-disease associationsmatepath2vec++matrix factorization
spellingShingle Yuchen Zhang
Xiujuan Lei
Zengqiang Fang
Yi Pan
CircRNA-Disease Associations Prediction Based on Metapath2vec++ and Matrix Factorization
Big Data Mining and Analytics
circular rnas (circrnas)
circrna-disease associations
matepath2vec++
matrix factorization
title CircRNA-Disease Associations Prediction Based on Metapath2vec++ and Matrix Factorization
title_full CircRNA-Disease Associations Prediction Based on Metapath2vec++ and Matrix Factorization
title_fullStr CircRNA-Disease Associations Prediction Based on Metapath2vec++ and Matrix Factorization
title_full_unstemmed CircRNA-Disease Associations Prediction Based on Metapath2vec++ and Matrix Factorization
title_short CircRNA-Disease Associations Prediction Based on Metapath2vec++ and Matrix Factorization
title_sort circrna disease associations prediction based on metapath2vec and matrix factorization
topic circular rnas (circrnas)
circrna-disease associations
matepath2vec++
matrix factorization
url https://www.sciopen.com/article/10.26599/BDMA.2020.9020025
work_keys_str_mv AT yuchenzhang circrnadiseaseassociationspredictionbasedonmetapath2vecandmatrixfactorization
AT xiujuanlei circrnadiseaseassociationspredictionbasedonmetapath2vecandmatrixfactorization
AT zengqiangfang circrnadiseaseassociationspredictionbasedonmetapath2vecandmatrixfactorization
AT yipan circrnadiseaseassociationspredictionbasedonmetapath2vecandmatrixfactorization