Integration of Human Protein Sequence and Protein-Protein Interaction Data by Graph Autoencoder to Identify Novel Protein-Abnormal Phenotype Associations
Understanding gene functions and their associated abnormal phenotypes is crucial in the prevention, diagnosis and treatment against diseases. The Human Phenotype Ontology (HPO) is a standardized vocabulary for describing the phenotype abnormalities associated with human diseases. However, the curren...
Main Authors: | , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-08-01
|
Series: | Cells |
Subjects: | |
Online Access: | https://www.mdpi.com/2073-4409/11/16/2485 |
_version_ | 1797410925260570624 |
---|---|
author | Yuan Liu Ruirui He Yingjie Qu Yuan Zhu Dianke Li Xinping Ling Simin Xia Zhenqiu Li Dong Li |
author_facet | Yuan Liu Ruirui He Yingjie Qu Yuan Zhu Dianke Li Xinping Ling Simin Xia Zhenqiu Li Dong Li |
author_sort | Yuan Liu |
collection | DOAJ |
description | Understanding gene functions and their associated abnormal phenotypes is crucial in the prevention, diagnosis and treatment against diseases. The Human Phenotype Ontology (HPO) is a standardized vocabulary for describing the phenotype abnormalities associated with human diseases. However, the current HPO annotations are far from completion, and only a small fraction of human protein-coding genes has HPO annotations. Thus, it is necessary to predict protein-phenotype associations using computational methods. Protein sequences can indicate the structure and function of the proteins, and interacting proteins are more likely to have same function. It is promising to integrate these features for predicting HPO annotations of human protein. We developed GraphPheno, a semi-supervised method based on graph autoencoders, which does not require feature engineering to capture deep features from protein sequences, while also taking into account the topological properties in the protein–protein interaction network to predict the relationships between human genes/proteins and abnormal phenotypes. Cross validation and independent dataset tests show that GraphPheno has satisfactory prediction performance. The algorithm is further confirmed on automatic HPO annotation for no-knowledge proteins under the benchmark of the second Critical Assessment of Functional Annotation, 2013–2014 (CAFA2), where GraphPheno surpasses most existing methods. Further bioinformatics analysis shows that predicted certain phenotype-associated genes using GraphPheno share similar biological properties with known ones. In a case study on the phenotype of abnormality of mitochondrial respiratory chain, top prioritized genes are validated by recent papers. We believe that GraphPheno will help to reveal more associations between genes and phenotypes, and contribute to the discovery of drug targets. |
first_indexed | 2024-03-09T04:37:29Z |
format | Article |
id | doaj.art-e5da7215c649434685468aafebc8c7b1 |
institution | Directory Open Access Journal |
issn | 2073-4409 |
language | English |
last_indexed | 2024-03-09T04:37:29Z |
publishDate | 2022-08-01 |
publisher | MDPI AG |
record_format | Article |
series | Cells |
spelling | doaj.art-e5da7215c649434685468aafebc8c7b12023-12-03T13:26:51ZengMDPI AGCells2073-44092022-08-011116248510.3390/cells11162485Integration of Human Protein Sequence and Protein-Protein Interaction Data by Graph Autoencoder to Identify Novel Protein-Abnormal Phenotype AssociationsYuan Liu0Ruirui He1Yingjie Qu2Yuan Zhu3Dianke Li4Xinping Ling5Simin Xia6Zhenqiu Li7Dong Li8State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, ChinaState Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, ChinaState Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, ChinaState Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, ChinaState Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, ChinaState Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, ChinaState Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, ChinaCollege of Life Sciences, Hebei University, Baoding 071002, ChinaState Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, ChinaUnderstanding gene functions and their associated abnormal phenotypes is crucial in the prevention, diagnosis and treatment against diseases. The Human Phenotype Ontology (HPO) is a standardized vocabulary for describing the phenotype abnormalities associated with human diseases. However, the current HPO annotations are far from completion, and only a small fraction of human protein-coding genes has HPO annotations. Thus, it is necessary to predict protein-phenotype associations using computational methods. Protein sequences can indicate the structure and function of the proteins, and interacting proteins are more likely to have same function. It is promising to integrate these features for predicting HPO annotations of human protein. We developed GraphPheno, a semi-supervised method based on graph autoencoders, which does not require feature engineering to capture deep features from protein sequences, while also taking into account the topological properties in the protein–protein interaction network to predict the relationships between human genes/proteins and abnormal phenotypes. Cross validation and independent dataset tests show that GraphPheno has satisfactory prediction performance. The algorithm is further confirmed on automatic HPO annotation for no-knowledge proteins under the benchmark of the second Critical Assessment of Functional Annotation, 2013–2014 (CAFA2), where GraphPheno surpasses most existing methods. Further bioinformatics analysis shows that predicted certain phenotype-associated genes using GraphPheno share similar biological properties with known ones. In a case study on the phenotype of abnormality of mitochondrial respiratory chain, top prioritized genes are validated by recent papers. We believe that GraphPheno will help to reveal more associations between genes and phenotypes, and contribute to the discovery of drug targets.https://www.mdpi.com/2073-4409/11/16/2485deep learninggraph autoencoderprotein-phenotype associations prediction |
spellingShingle | Yuan Liu Ruirui He Yingjie Qu Yuan Zhu Dianke Li Xinping Ling Simin Xia Zhenqiu Li Dong Li Integration of Human Protein Sequence and Protein-Protein Interaction Data by Graph Autoencoder to Identify Novel Protein-Abnormal Phenotype Associations Cells deep learning graph autoencoder protein-phenotype associations prediction |
title | Integration of Human Protein Sequence and Protein-Protein Interaction Data by Graph Autoencoder to Identify Novel Protein-Abnormal Phenotype Associations |
title_full | Integration of Human Protein Sequence and Protein-Protein Interaction Data by Graph Autoencoder to Identify Novel Protein-Abnormal Phenotype Associations |
title_fullStr | Integration of Human Protein Sequence and Protein-Protein Interaction Data by Graph Autoencoder to Identify Novel Protein-Abnormal Phenotype Associations |
title_full_unstemmed | Integration of Human Protein Sequence and Protein-Protein Interaction Data by Graph Autoencoder to Identify Novel Protein-Abnormal Phenotype Associations |
title_short | Integration of Human Protein Sequence and Protein-Protein Interaction Data by Graph Autoencoder to Identify Novel Protein-Abnormal Phenotype Associations |
title_sort | integration of human protein sequence and protein protein interaction data by graph autoencoder to identify novel protein abnormal phenotype associations |
topic | deep learning graph autoencoder protein-phenotype associations prediction |
url | https://www.mdpi.com/2073-4409/11/16/2485 |
work_keys_str_mv | AT yuanliu integrationofhumanproteinsequenceandproteinproteininteractiondatabygraphautoencodertoidentifynovelproteinabnormalphenotypeassociations AT ruiruihe integrationofhumanproteinsequenceandproteinproteininteractiondatabygraphautoencodertoidentifynovelproteinabnormalphenotypeassociations AT yingjiequ integrationofhumanproteinsequenceandproteinproteininteractiondatabygraphautoencodertoidentifynovelproteinabnormalphenotypeassociations AT yuanzhu integrationofhumanproteinsequenceandproteinproteininteractiondatabygraphautoencodertoidentifynovelproteinabnormalphenotypeassociations AT diankeli integrationofhumanproteinsequenceandproteinproteininteractiondatabygraphautoencodertoidentifynovelproteinabnormalphenotypeassociations AT xinpingling integrationofhumanproteinsequenceandproteinproteininteractiondatabygraphautoencodertoidentifynovelproteinabnormalphenotypeassociations AT siminxia integrationofhumanproteinsequenceandproteinproteininteractiondatabygraphautoencodertoidentifynovelproteinabnormalphenotypeassociations AT zhenqiuli integrationofhumanproteinsequenceandproteinproteininteractiondatabygraphautoencodertoidentifynovelproteinabnormalphenotypeassociations AT dongli integrationofhumanproteinsequenceandproteinproteininteractiondatabygraphautoencodertoidentifynovelproteinabnormalphenotypeassociations |