PlantPhos: using maximal dependence decomposition to identify plant phosphorylation sites with substrate site specificity
<p>Abstract</p> <p>Background</p> <p>Protein phosphorylation catalyzed by kinases plays crucial regulatory roles in intracellular signal transduction. Due to the difficulty in performing high-throughput mass spectrometry-based experiment, there is a desire to predict ph...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2011-06-01
|
Series: | BMC Bioinformatics |
Online Access: | http://www.biomedcentral.com/1471-2105/12/261 |
_version_ | 1818565679544008704 |
---|---|
author | Bretaña Neil Lee Tzong-Yi Lu Cheng-Tsung |
author_facet | Bretaña Neil Lee Tzong-Yi Lu Cheng-Tsung |
author_sort | Bretaña Neil |
collection | DOAJ |
description | <p>Abstract</p> <p>Background</p> <p>Protein phosphorylation catalyzed by kinases plays crucial regulatory roles in intracellular signal transduction. Due to the difficulty in performing high-throughput mass spectrometry-based experiment, there is a desire to predict phosphorylation sites using computational methods. However, previous studies regarding <it>in silico </it>prediction of plant phosphorylation sites lack the consideration of kinase-specific phosphorylation data. Thus, we are motivated to propose a new method that investigates different substrate specificities in plant phosphorylation sites.</p> <p>Results</p> <p>Experimentally verified phosphorylation data were extracted from TAIR9-a protein database containing 3006 phosphorylation data from the plant species <it>Arabidopsis thaliana</it>. In an attempt to investigate the various substrate motifs in plant phosphorylation, maximal dependence decomposition (MDD) is employed to cluster a large set of phosphorylation data into subgroups containing significantly conserved motifs. Profile hidden Markov model (HMM) is then applied to learn a predictive model for each subgroup. Cross-validation evaluation on the MDD-clustered HMMs yields an average accuracy of 82.4% for serine, 78.6% for threonine, and 89.0% for tyrosine models. Moreover, independent test results using <it>Arabidopsis thaliana </it>phosphorylation data from UniProtKB/Swiss-Prot show that the proposed models are able to correctly predict 81.4% phosphoserine, 77.1% phosphothreonine, and 83.7% phosphotyrosine sites. Interestingly, several MDD-clustered subgroups are observed to have similar amino acid conservation with the substrate motifs of well-known kinases from Phospho.ELM-a database containing kinase-specific phosphorylation data from multiple organisms.</p> <p>Conclusions</p> <p>This work presents a novel method for identifying plant phosphorylation sites with various substrate motifs. Based on cross-validation and independent testing, results show that the MDD-clustered models outperform models trained without using MDD. The proposed method has been implemented as a web-based plant phosphorylation prediction tool, PlantPhos <url>http://csb.cse.yzu.edu.tw/PlantPhos/</url>. Additionally, two case studies have been demonstrated to further evaluate the effectiveness of PlantPhos.</p> |
first_indexed | 2024-12-14T01:44:14Z |
format | Article |
id | doaj.art-cef32530c3dd407996ca88fcb1d98fcc |
institution | Directory Open Access Journal |
issn | 1471-2105 |
language | English |
last_indexed | 2024-12-14T01:44:14Z |
publishDate | 2011-06-01 |
publisher | BMC |
record_format | Article |
series | BMC Bioinformatics |
spelling | doaj.art-cef32530c3dd407996ca88fcb1d98fcc2022-12-21T23:21:37ZengBMCBMC Bioinformatics1471-21052011-06-0112126110.1186/1471-2105-12-261PlantPhos: using maximal dependence decomposition to identify plant phosphorylation sites with substrate site specificityBretaña NeilLee Tzong-YiLu Cheng-Tsung<p>Abstract</p> <p>Background</p> <p>Protein phosphorylation catalyzed by kinases plays crucial regulatory roles in intracellular signal transduction. Due to the difficulty in performing high-throughput mass spectrometry-based experiment, there is a desire to predict phosphorylation sites using computational methods. However, previous studies regarding <it>in silico </it>prediction of plant phosphorylation sites lack the consideration of kinase-specific phosphorylation data. Thus, we are motivated to propose a new method that investigates different substrate specificities in plant phosphorylation sites.</p> <p>Results</p> <p>Experimentally verified phosphorylation data were extracted from TAIR9-a protein database containing 3006 phosphorylation data from the plant species <it>Arabidopsis thaliana</it>. In an attempt to investigate the various substrate motifs in plant phosphorylation, maximal dependence decomposition (MDD) is employed to cluster a large set of phosphorylation data into subgroups containing significantly conserved motifs. Profile hidden Markov model (HMM) is then applied to learn a predictive model for each subgroup. Cross-validation evaluation on the MDD-clustered HMMs yields an average accuracy of 82.4% for serine, 78.6% for threonine, and 89.0% for tyrosine models. Moreover, independent test results using <it>Arabidopsis thaliana </it>phosphorylation data from UniProtKB/Swiss-Prot show that the proposed models are able to correctly predict 81.4% phosphoserine, 77.1% phosphothreonine, and 83.7% phosphotyrosine sites. Interestingly, several MDD-clustered subgroups are observed to have similar amino acid conservation with the substrate motifs of well-known kinases from Phospho.ELM-a database containing kinase-specific phosphorylation data from multiple organisms.</p> <p>Conclusions</p> <p>This work presents a novel method for identifying plant phosphorylation sites with various substrate motifs. Based on cross-validation and independent testing, results show that the MDD-clustered models outperform models trained without using MDD. The proposed method has been implemented as a web-based plant phosphorylation prediction tool, PlantPhos <url>http://csb.cse.yzu.edu.tw/PlantPhos/</url>. Additionally, two case studies have been demonstrated to further evaluate the effectiveness of PlantPhos.</p>http://www.biomedcentral.com/1471-2105/12/261 |
spellingShingle | Bretaña Neil Lee Tzong-Yi Lu Cheng-Tsung PlantPhos: using maximal dependence decomposition to identify plant phosphorylation sites with substrate site specificity BMC Bioinformatics |
title | PlantPhos: using maximal dependence decomposition to identify plant phosphorylation sites with substrate site specificity |
title_full | PlantPhos: using maximal dependence decomposition to identify plant phosphorylation sites with substrate site specificity |
title_fullStr | PlantPhos: using maximal dependence decomposition to identify plant phosphorylation sites with substrate site specificity |
title_full_unstemmed | PlantPhos: using maximal dependence decomposition to identify plant phosphorylation sites with substrate site specificity |
title_short | PlantPhos: using maximal dependence decomposition to identify plant phosphorylation sites with substrate site specificity |
title_sort | plantphos using maximal dependence decomposition to identify plant phosphorylation sites with substrate site specificity |
url | http://www.biomedcentral.com/1471-2105/12/261 |
work_keys_str_mv | AT bretananeil plantphosusingmaximaldependencedecompositiontoidentifyplantphosphorylationsiteswithsubstratesitespecificity AT leetzongyi plantphosusingmaximaldependencedecompositiontoidentifyplantphosphorylationsiteswithsubstratesitespecificity AT luchengtsung plantphosusingmaximaldependencedecompositiontoidentifyplantphosphorylationsiteswithsubstratesitespecificity |