A protein domain co-occurrence network approach for predicting protein function and inferring species phylogeny.
Protein Domain Co-occurrence Network (DCN) is a biological network that has not been fully-studied. We analyzed the properties of the DCNs of H. sapiens, S. cerevisiae, C. elegans, D. melanogaster, and 15 plant genomes. These DCNs have the hallmark features of scale-free networks. We investigated th...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2011-03-01
|
Series: | PLoS ONE |
Online Access: | http://europepmc.org/articles/PMC3063783?pdf=render |
_version_ | 1811282707905249280 |
---|---|
author | Zheng Wang Xue-Cheng Zhang Mi Ha Le Dong Xu Gary Stacey Jianlin Cheng |
author_facet | Zheng Wang Xue-Cheng Zhang Mi Ha Le Dong Xu Gary Stacey Jianlin Cheng |
author_sort | Zheng Wang |
collection | DOAJ |
description | Protein Domain Co-occurrence Network (DCN) is a biological network that has not been fully-studied. We analyzed the properties of the DCNs of H. sapiens, S. cerevisiae, C. elegans, D. melanogaster, and 15 plant genomes. These DCNs have the hallmark features of scale-free networks. We investigated the possibility of using DCNs to predict protein and domain functions. Based on our experiment conducted on 66 randomly selected proteins, the best of top 3 predictions made by our DCN-based aggregated neighbor-counting method achieved a semantic similarity score of 0.81 to the actual Gene Ontology terms of the proteins. Moreover, the top 3 predictions using neighbor-counting, χ(2), and a SVM-based method achieved an accuracy of 66%, 59%, and 61%, respectively, when used to predict specific Gene Ontology terms of human target domains. These predictions on average had a semantic similarity score of 0.82, 0.80, and 0.79 to the actual Gene Ontology terms, respectively. We also used DCNs to predict whether a domain is an enzyme domain, and our SVM-based and neighbor-inference method correctly classified 79% and 77% of the target domains, respectively. When using DCNs to classify a target domain into one of the six enzyme classes, we found that, as long as there is one EC number available in the neighboring domains, our SVM-based and neighboring-counting method correctly classified 92.4% and 91.9% of the target domains, respectively. Furthermore, we benchmarked the performance of using DCNs to infer species phylogenies on six different combinations of 398 single-chromosome prokaryotic genomes. The phylogenetic tree of 54 prokaryotic taxa generated by our DCNs-alignment-based method achieved a 93.45% similarity score compared to the Bergey's taxonomy. In summary, our studies show that genome-wide DCNs contain rich information that can be effectively used to decipher protein function and reveal the evolutionary relationship among species. |
first_indexed | 2024-04-13T01:57:27Z |
format | Article |
id | doaj.art-9cc7dafdac8f4edc905d174bcd60bf70 |
institution | Directory Open Access Journal |
issn | 1932-6203 |
language | English |
last_indexed | 2024-04-13T01:57:27Z |
publishDate | 2011-03-01 |
publisher | Public Library of Science (PLoS) |
record_format | Article |
series | PLoS ONE |
spelling | doaj.art-9cc7dafdac8f4edc905d174bcd60bf702022-12-22T03:07:44ZengPublic Library of Science (PLoS)PLoS ONE1932-62032011-03-0163e1790610.1371/journal.pone.0017906A protein domain co-occurrence network approach for predicting protein function and inferring species phylogeny.Zheng WangXue-Cheng ZhangMi Ha LeDong XuGary StaceyJianlin ChengProtein Domain Co-occurrence Network (DCN) is a biological network that has not been fully-studied. We analyzed the properties of the DCNs of H. sapiens, S. cerevisiae, C. elegans, D. melanogaster, and 15 plant genomes. These DCNs have the hallmark features of scale-free networks. We investigated the possibility of using DCNs to predict protein and domain functions. Based on our experiment conducted on 66 randomly selected proteins, the best of top 3 predictions made by our DCN-based aggregated neighbor-counting method achieved a semantic similarity score of 0.81 to the actual Gene Ontology terms of the proteins. Moreover, the top 3 predictions using neighbor-counting, χ(2), and a SVM-based method achieved an accuracy of 66%, 59%, and 61%, respectively, when used to predict specific Gene Ontology terms of human target domains. These predictions on average had a semantic similarity score of 0.82, 0.80, and 0.79 to the actual Gene Ontology terms, respectively. We also used DCNs to predict whether a domain is an enzyme domain, and our SVM-based and neighbor-inference method correctly classified 79% and 77% of the target domains, respectively. When using DCNs to classify a target domain into one of the six enzyme classes, we found that, as long as there is one EC number available in the neighboring domains, our SVM-based and neighboring-counting method correctly classified 92.4% and 91.9% of the target domains, respectively. Furthermore, we benchmarked the performance of using DCNs to infer species phylogenies on six different combinations of 398 single-chromosome prokaryotic genomes. The phylogenetic tree of 54 prokaryotic taxa generated by our DCNs-alignment-based method achieved a 93.45% similarity score compared to the Bergey's taxonomy. In summary, our studies show that genome-wide DCNs contain rich information that can be effectively used to decipher protein function and reveal the evolutionary relationship among species.http://europepmc.org/articles/PMC3063783?pdf=render |
spellingShingle | Zheng Wang Xue-Cheng Zhang Mi Ha Le Dong Xu Gary Stacey Jianlin Cheng A protein domain co-occurrence network approach for predicting protein function and inferring species phylogeny. PLoS ONE |
title | A protein domain co-occurrence network approach for predicting protein function and inferring species phylogeny. |
title_full | A protein domain co-occurrence network approach for predicting protein function and inferring species phylogeny. |
title_fullStr | A protein domain co-occurrence network approach for predicting protein function and inferring species phylogeny. |
title_full_unstemmed | A protein domain co-occurrence network approach for predicting protein function and inferring species phylogeny. |
title_short | A protein domain co-occurrence network approach for predicting protein function and inferring species phylogeny. |
title_sort | protein domain co occurrence network approach for predicting protein function and inferring species phylogeny |
url | http://europepmc.org/articles/PMC3063783?pdf=render |
work_keys_str_mv | AT zhengwang aproteindomaincooccurrencenetworkapproachforpredictingproteinfunctionandinferringspeciesphylogeny AT xuechengzhang aproteindomaincooccurrencenetworkapproachforpredictingproteinfunctionandinferringspeciesphylogeny AT mihale aproteindomaincooccurrencenetworkapproachforpredictingproteinfunctionandinferringspeciesphylogeny AT dongxu aproteindomaincooccurrencenetworkapproachforpredictingproteinfunctionandinferringspeciesphylogeny AT garystacey aproteindomaincooccurrencenetworkapproachforpredictingproteinfunctionandinferringspeciesphylogeny AT jianlincheng aproteindomaincooccurrencenetworkapproachforpredictingproteinfunctionandinferringspeciesphylogeny AT zhengwang proteindomaincooccurrencenetworkapproachforpredictingproteinfunctionandinferringspeciesphylogeny AT xuechengzhang proteindomaincooccurrencenetworkapproachforpredictingproteinfunctionandinferringspeciesphylogeny AT mihale proteindomaincooccurrencenetworkapproachforpredictingproteinfunctionandinferringspeciesphylogeny AT dongxu proteindomaincooccurrencenetworkapproachforpredictingproteinfunctionandinferringspeciesphylogeny AT garystacey proteindomaincooccurrencenetworkapproachforpredictingproteinfunctionandinferringspeciesphylogeny AT jianlincheng proteindomaincooccurrencenetworkapproachforpredictingproteinfunctionandinferringspeciesphylogeny |