Identifying Breast Cancer-Related Genes Based on a Novel Computational Framework Involving KEGG Pathways and PPI Network Modularity

Complex diseases, such as breast cancer, are often caused by mutations of multiple functional genes. Identifying disease-related genes is a critical and challenging task for unveiling the biological mechanisms behind these diseases. In this study, we develop a novel computational framework to analyz...

Full description

Bibliographic Details
Main Authors: Yan Zhang, Ju Xiang, Liang Tang, Jianming Li, Qingqing Lu, Geng Tian, Bin-Sheng He, Jialiang Yang
Format: Article
Language:English
Published: Frontiers Media S.A. 2021-08-01
Series:Frontiers in Genetics
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fgene.2021.596794/full
_version_ 1819081527780179968
author Yan Zhang
Yan Zhang
Yan Zhang
Ju Xiang
Ju Xiang
Ju Xiang
Liang Tang
Jianming Li
Qingqing Lu
Qingqing Lu
Geng Tian
Geng Tian
Bin-Sheng He
Bin-Sheng He
Jialiang Yang
Jialiang Yang
Jialiang Yang
author_facet Yan Zhang
Yan Zhang
Yan Zhang
Ju Xiang
Ju Xiang
Ju Xiang
Liang Tang
Jianming Li
Qingqing Lu
Qingqing Lu
Geng Tian
Geng Tian
Bin-Sheng He
Bin-Sheng He
Jialiang Yang
Jialiang Yang
Jialiang Yang
author_sort Yan Zhang
collection DOAJ
description Complex diseases, such as breast cancer, are often caused by mutations of multiple functional genes. Identifying disease-related genes is a critical and challenging task for unveiling the biological mechanisms behind these diseases. In this study, we develop a novel computational framework to analyze the network properties of the known breast cancer–associated genes, based on which we develop a random-walk-with-restart (RCRWR) algorithm to predict novel disease genes. Specifically, we first curated a set of breast cancer–associated genes from the Genome-Wide Association Studies catalog and Online Mendelian Inheritance in Man database and then studied the distribution of these genes on an integrated protein–protein interaction (PPI) network. We found that the breast cancer–associated genes are significantly closer to each other than random, which confirms the modularity property of disease genes in a PPI network as revealed by previous studies. We then retrieved PPI subnetworks spanning top breast cancer–associated KEGG pathways and found that the distribution of these genes on the subnetworks are non-random, suggesting that these KEGG pathways are activated non-uniformly. Taking advantage of the non-random distribution of breast cancer–associated genes, we developed an improved RCRWR algorithm to predict novel cancer genes, which integrates network reconstruction based on local random walk dynamics and subnetworks spanning KEGG pathways. Compared with the disease gene prediction without using the information from the KEGG pathways, this method has a better prediction performance on inferring breast cancer–associated genes, and the top predicted genes are better enriched on known breast cancer–associated gene ontologies. Finally, we performed a literature search on top predicted novel genes and found that most of them are supported by at least wet-lab experiments on cell lines. In summary, we propose a robust computational framework to prioritize novel breast cancer–associated genes, which could be used for further in vitro and in vivo experimental validation.
first_indexed 2024-12-21T20:02:12Z
format Article
id doaj.art-2841e9124e2f4e33aad562cf26f8e2fd
institution Directory Open Access Journal
issn 1664-8021
language English
last_indexed 2024-12-21T20:02:12Z
publishDate 2021-08-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Genetics
spelling doaj.art-2841e9124e2f4e33aad562cf26f8e2fd2022-12-21T18:51:56ZengFrontiers Media S.A.Frontiers in Genetics1664-80212021-08-011210.3389/fgene.2021.596794596794Identifying Breast Cancer-Related Genes Based on a Novel Computational Framework Involving KEGG Pathways and PPI Network ModularityYan Zhang0Yan Zhang1Yan Zhang2Ju Xiang3Ju Xiang4Ju Xiang5Liang Tang6Jianming Li7Qingqing Lu8Qingqing Lu9Geng Tian10Geng Tian11Bin-Sheng He12Bin-Sheng He13Jialiang Yang14Jialiang Yang15Jialiang Yang16School of Computer Science and Engineering, Central South University, Changsha, ChinaSchool of Information Science and Engineering, Changsha Medical University, Changsha, ChinaAcademician Workstation, Changsha Medical University, Changsha, ChinaSchool of Computer Science and Engineering, Central South University, Changsha, ChinaAcademician Workstation, Changsha Medical University, Changsha, ChinaNeuroscience Research Center & Department of Basic Medical Sciences, Changsha Medical University, Changsha, ChinaNeuroscience Research Center & Department of Basic Medical Sciences, Changsha Medical University, Changsha, ChinaNeuroscience Research Center & Department of Basic Medical Sciences, Changsha Medical University, Changsha, ChinaQingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, ChinaGeneis Beijing Co., Ltd., Beijing, ChinaQingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, ChinaGeneis Beijing Co., Ltd., Beijing, ChinaAcademician Workstation, Changsha Medical University, Changsha, ChinaNeuroscience Research Center & Department of Basic Medical Sciences, Changsha Medical University, Changsha, ChinaAcademician Workstation, Changsha Medical University, Changsha, ChinaQingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, ChinaGeneis Beijing Co., Ltd., Beijing, ChinaComplex diseases, such as breast cancer, are often caused by mutations of multiple functional genes. Identifying disease-related genes is a critical and challenging task for unveiling the biological mechanisms behind these diseases. In this study, we develop a novel computational framework to analyze the network properties of the known breast cancer–associated genes, based on which we develop a random-walk-with-restart (RCRWR) algorithm to predict novel disease genes. Specifically, we first curated a set of breast cancer–associated genes from the Genome-Wide Association Studies catalog and Online Mendelian Inheritance in Man database and then studied the distribution of these genes on an integrated protein–protein interaction (PPI) network. We found that the breast cancer–associated genes are significantly closer to each other than random, which confirms the modularity property of disease genes in a PPI network as revealed by previous studies. We then retrieved PPI subnetworks spanning top breast cancer–associated KEGG pathways and found that the distribution of these genes on the subnetworks are non-random, suggesting that these KEGG pathways are activated non-uniformly. Taking advantage of the non-random distribution of breast cancer–associated genes, we developed an improved RCRWR algorithm to predict novel cancer genes, which integrates network reconstruction based on local random walk dynamics and subnetworks spanning KEGG pathways. Compared with the disease gene prediction without using the information from the KEGG pathways, this method has a better prediction performance on inferring breast cancer–associated genes, and the top predicted genes are better enriched on known breast cancer–associated gene ontologies. Finally, we performed a literature search on top predicted novel genes and found that most of them are supported by at least wet-lab experiments on cell lines. In summary, we propose a robust computational framework to prioritize novel breast cancer–associated genes, which could be used for further in vitro and in vivo experimental validation.https://www.frontiersin.org/articles/10.3389/fgene.2021.596794/fulldisease-gene predictionprotein-protein interactionsKEGG pathwaybreast cancernetwork propagation
spellingShingle Yan Zhang
Yan Zhang
Yan Zhang
Ju Xiang
Ju Xiang
Ju Xiang
Liang Tang
Jianming Li
Qingqing Lu
Qingqing Lu
Geng Tian
Geng Tian
Bin-Sheng He
Bin-Sheng He
Jialiang Yang
Jialiang Yang
Jialiang Yang
Identifying Breast Cancer-Related Genes Based on a Novel Computational Framework Involving KEGG Pathways and PPI Network Modularity
Frontiers in Genetics
disease-gene prediction
protein-protein interactions
KEGG pathway
breast cancer
network propagation
title Identifying Breast Cancer-Related Genes Based on a Novel Computational Framework Involving KEGG Pathways and PPI Network Modularity
title_full Identifying Breast Cancer-Related Genes Based on a Novel Computational Framework Involving KEGG Pathways and PPI Network Modularity
title_fullStr Identifying Breast Cancer-Related Genes Based on a Novel Computational Framework Involving KEGG Pathways and PPI Network Modularity
title_full_unstemmed Identifying Breast Cancer-Related Genes Based on a Novel Computational Framework Involving KEGG Pathways and PPI Network Modularity
title_short Identifying Breast Cancer-Related Genes Based on a Novel Computational Framework Involving KEGG Pathways and PPI Network Modularity
title_sort identifying breast cancer related genes based on a novel computational framework involving kegg pathways and ppi network modularity
topic disease-gene prediction
protein-protein interactions
KEGG pathway
breast cancer
network propagation
url https://www.frontiersin.org/articles/10.3389/fgene.2021.596794/full
work_keys_str_mv AT yanzhang identifyingbreastcancerrelatedgenesbasedonanovelcomputationalframeworkinvolvingkeggpathwaysandppinetworkmodularity
AT yanzhang identifyingbreastcancerrelatedgenesbasedonanovelcomputationalframeworkinvolvingkeggpathwaysandppinetworkmodularity
AT yanzhang identifyingbreastcancerrelatedgenesbasedonanovelcomputationalframeworkinvolvingkeggpathwaysandppinetworkmodularity
AT juxiang identifyingbreastcancerrelatedgenesbasedonanovelcomputationalframeworkinvolvingkeggpathwaysandppinetworkmodularity
AT juxiang identifyingbreastcancerrelatedgenesbasedonanovelcomputationalframeworkinvolvingkeggpathwaysandppinetworkmodularity
AT juxiang identifyingbreastcancerrelatedgenesbasedonanovelcomputationalframeworkinvolvingkeggpathwaysandppinetworkmodularity
AT liangtang identifyingbreastcancerrelatedgenesbasedonanovelcomputationalframeworkinvolvingkeggpathwaysandppinetworkmodularity
AT jianmingli identifyingbreastcancerrelatedgenesbasedonanovelcomputationalframeworkinvolvingkeggpathwaysandppinetworkmodularity
AT qingqinglu identifyingbreastcancerrelatedgenesbasedonanovelcomputationalframeworkinvolvingkeggpathwaysandppinetworkmodularity
AT qingqinglu identifyingbreastcancerrelatedgenesbasedonanovelcomputationalframeworkinvolvingkeggpathwaysandppinetworkmodularity
AT gengtian identifyingbreastcancerrelatedgenesbasedonanovelcomputationalframeworkinvolvingkeggpathwaysandppinetworkmodularity
AT gengtian identifyingbreastcancerrelatedgenesbasedonanovelcomputationalframeworkinvolvingkeggpathwaysandppinetworkmodularity
AT binshenghe identifyingbreastcancerrelatedgenesbasedonanovelcomputationalframeworkinvolvingkeggpathwaysandppinetworkmodularity
AT binshenghe identifyingbreastcancerrelatedgenesbasedonanovelcomputationalframeworkinvolvingkeggpathwaysandppinetworkmodularity
AT jialiangyang identifyingbreastcancerrelatedgenesbasedonanovelcomputationalframeworkinvolvingkeggpathwaysandppinetworkmodularity
AT jialiangyang identifyingbreastcancerrelatedgenesbasedonanovelcomputationalframeworkinvolvingkeggpathwaysandppinetworkmodularity
AT jialiangyang identifyingbreastcancerrelatedgenesbasedonanovelcomputationalframeworkinvolvingkeggpathwaysandppinetworkmodularity