Integration of Human Protein Sequence and Protein-Protein Interaction Data by Graph Autoencoder to Identify Novel Protein-Abnormal Phenotype Associations

Understanding gene functions and their associated abnormal phenotypes is crucial in the prevention, diagnosis and treatment against diseases. The Human Phenotype Ontology (HPO) is a standardized vocabulary for describing the phenotype abnormalities associated with human diseases. However, the curren...

Full description

Bibliographic Details
Main Authors: Yuan Liu, Ruirui He, Yingjie Qu, Yuan Zhu, Dianke Li, Xinping Ling, Simin Xia, Zhenqiu Li, Dong Li
Format: Article
Language:English
Published: MDPI AG 2022-08-01
Series:Cells
Subjects:
Online Access:https://www.mdpi.com/2073-4409/11/16/2485
_version_ 1797410925260570624
author Yuan Liu
Ruirui He
Yingjie Qu
Yuan Zhu
Dianke Li
Xinping Ling
Simin Xia
Zhenqiu Li
Dong Li
author_facet Yuan Liu
Ruirui He
Yingjie Qu
Yuan Zhu
Dianke Li
Xinping Ling
Simin Xia
Zhenqiu Li
Dong Li
author_sort Yuan Liu
collection DOAJ
description Understanding gene functions and their associated abnormal phenotypes is crucial in the prevention, diagnosis and treatment against diseases. The Human Phenotype Ontology (HPO) is a standardized vocabulary for describing the phenotype abnormalities associated with human diseases. However, the current HPO annotations are far from completion, and only a small fraction of human protein-coding genes has HPO annotations. Thus, it is necessary to predict protein-phenotype associations using computational methods. Protein sequences can indicate the structure and function of the proteins, and interacting proteins are more likely to have same function. It is promising to integrate these features for predicting HPO annotations of human protein. We developed GraphPheno, a semi-supervised method based on graph autoencoders, which does not require feature engineering to capture deep features from protein sequences, while also taking into account the topological properties in the protein–protein interaction network to predict the relationships between human genes/proteins and abnormal phenotypes. Cross validation and independent dataset tests show that GraphPheno has satisfactory prediction performance. The algorithm is further confirmed on automatic HPO annotation for no-knowledge proteins under the benchmark of the second Critical Assessment of Functional Annotation, 2013–2014 (CAFA2), where GraphPheno surpasses most existing methods. Further bioinformatics analysis shows that predicted certain phenotype-associated genes using GraphPheno share similar biological properties with known ones. In a case study on the phenotype of abnormality of mitochondrial respiratory chain, top prioritized genes are validated by recent papers. We believe that GraphPheno will help to reveal more associations between genes and phenotypes, and contribute to the discovery of drug targets.
first_indexed 2024-03-09T04:37:29Z
format Article
id doaj.art-e5da7215c649434685468aafebc8c7b1
institution Directory Open Access Journal
issn 2073-4409
language English
last_indexed 2024-03-09T04:37:29Z
publishDate 2022-08-01
publisher MDPI AG
record_format Article
series Cells
spelling doaj.art-e5da7215c649434685468aafebc8c7b12023-12-03T13:26:51ZengMDPI AGCells2073-44092022-08-011116248510.3390/cells11162485Integration of Human Protein Sequence and Protein-Protein Interaction Data by Graph Autoencoder to Identify Novel Protein-Abnormal Phenotype AssociationsYuan Liu0Ruirui He1Yingjie Qu2Yuan Zhu3Dianke Li4Xinping Ling5Simin Xia6Zhenqiu Li7Dong Li8State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, ChinaState Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, ChinaState Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, ChinaState Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, ChinaState Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, ChinaState Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, ChinaState Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, ChinaCollege of Life Sciences, Hebei University, Baoding 071002, ChinaState Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, ChinaUnderstanding gene functions and their associated abnormal phenotypes is crucial in the prevention, diagnosis and treatment against diseases. The Human Phenotype Ontology (HPO) is a standardized vocabulary for describing the phenotype abnormalities associated with human diseases. However, the current HPO annotations are far from completion, and only a small fraction of human protein-coding genes has HPO annotations. Thus, it is necessary to predict protein-phenotype associations using computational methods. Protein sequences can indicate the structure and function of the proteins, and interacting proteins are more likely to have same function. It is promising to integrate these features for predicting HPO annotations of human protein. We developed GraphPheno, a semi-supervised method based on graph autoencoders, which does not require feature engineering to capture deep features from protein sequences, while also taking into account the topological properties in the protein–protein interaction network to predict the relationships between human genes/proteins and abnormal phenotypes. Cross validation and independent dataset tests show that GraphPheno has satisfactory prediction performance. The algorithm is further confirmed on automatic HPO annotation for no-knowledge proteins under the benchmark of the second Critical Assessment of Functional Annotation, 2013–2014 (CAFA2), where GraphPheno surpasses most existing methods. Further bioinformatics analysis shows that predicted certain phenotype-associated genes using GraphPheno share similar biological properties with known ones. In a case study on the phenotype of abnormality of mitochondrial respiratory chain, top prioritized genes are validated by recent papers. We believe that GraphPheno will help to reveal more associations between genes and phenotypes, and contribute to the discovery of drug targets.https://www.mdpi.com/2073-4409/11/16/2485deep learninggraph autoencoderprotein-phenotype associations prediction
spellingShingle Yuan Liu
Ruirui He
Yingjie Qu
Yuan Zhu
Dianke Li
Xinping Ling
Simin Xia
Zhenqiu Li
Dong Li
Integration of Human Protein Sequence and Protein-Protein Interaction Data by Graph Autoencoder to Identify Novel Protein-Abnormal Phenotype Associations
Cells
deep learning
graph autoencoder
protein-phenotype associations prediction
title Integration of Human Protein Sequence and Protein-Protein Interaction Data by Graph Autoencoder to Identify Novel Protein-Abnormal Phenotype Associations
title_full Integration of Human Protein Sequence and Protein-Protein Interaction Data by Graph Autoencoder to Identify Novel Protein-Abnormal Phenotype Associations
title_fullStr Integration of Human Protein Sequence and Protein-Protein Interaction Data by Graph Autoencoder to Identify Novel Protein-Abnormal Phenotype Associations
title_full_unstemmed Integration of Human Protein Sequence and Protein-Protein Interaction Data by Graph Autoencoder to Identify Novel Protein-Abnormal Phenotype Associations
title_short Integration of Human Protein Sequence and Protein-Protein Interaction Data by Graph Autoencoder to Identify Novel Protein-Abnormal Phenotype Associations
title_sort integration of human protein sequence and protein protein interaction data by graph autoencoder to identify novel protein abnormal phenotype associations
topic deep learning
graph autoencoder
protein-phenotype associations prediction
url https://www.mdpi.com/2073-4409/11/16/2485
work_keys_str_mv AT yuanliu integrationofhumanproteinsequenceandproteinproteininteractiondatabygraphautoencodertoidentifynovelproteinabnormalphenotypeassociations
AT ruiruihe integrationofhumanproteinsequenceandproteinproteininteractiondatabygraphautoencodertoidentifynovelproteinabnormalphenotypeassociations
AT yingjiequ integrationofhumanproteinsequenceandproteinproteininteractiondatabygraphautoencodertoidentifynovelproteinabnormalphenotypeassociations
AT yuanzhu integrationofhumanproteinsequenceandproteinproteininteractiondatabygraphautoencodertoidentifynovelproteinabnormalphenotypeassociations
AT diankeli integrationofhumanproteinsequenceandproteinproteininteractiondatabygraphautoencodertoidentifynovelproteinabnormalphenotypeassociations
AT xinpingling integrationofhumanproteinsequenceandproteinproteininteractiondatabygraphautoencodertoidentifynovelproteinabnormalphenotypeassociations
AT siminxia integrationofhumanproteinsequenceandproteinproteininteractiondatabygraphautoencodertoidentifynovelproteinabnormalphenotypeassociations
AT zhenqiuli integrationofhumanproteinsequenceandproteinproteininteractiondatabygraphautoencodertoidentifynovelproteinabnormalphenotypeassociations
AT dongli integrationofhumanproteinsequenceandproteinproteininteractiondatabygraphautoencodertoidentifynovelproteinabnormalphenotypeassociations