PlantPhos: using maximal dependence decomposition to identify plant phosphorylation sites with substrate site specificity

<p>Abstract</p> <p>Background</p> <p>Protein phosphorylation catalyzed by kinases plays crucial regulatory roles in intracellular signal transduction. Due to the difficulty in performing high-throughput mass spectrometry-based experiment, there is a desire to predict ph...

Full description

Bibliographic Details
Main Authors: Bretaña Neil, Lee Tzong-Yi, Lu Cheng-Tsung
Format: Article
Language:English
Published: BMC 2011-06-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/12/261
_version_ 1818565679544008704
author Bretaña Neil
Lee Tzong-Yi
Lu Cheng-Tsung
author_facet Bretaña Neil
Lee Tzong-Yi
Lu Cheng-Tsung
author_sort Bretaña Neil
collection DOAJ
description <p>Abstract</p> <p>Background</p> <p>Protein phosphorylation catalyzed by kinases plays crucial regulatory roles in intracellular signal transduction. Due to the difficulty in performing high-throughput mass spectrometry-based experiment, there is a desire to predict phosphorylation sites using computational methods. However, previous studies regarding <it>in silico </it>prediction of plant phosphorylation sites lack the consideration of kinase-specific phosphorylation data. Thus, we are motivated to propose a new method that investigates different substrate specificities in plant phosphorylation sites.</p> <p>Results</p> <p>Experimentally verified phosphorylation data were extracted from TAIR9-a protein database containing 3006 phosphorylation data from the plant species <it>Arabidopsis thaliana</it>. In an attempt to investigate the various substrate motifs in plant phosphorylation, maximal dependence decomposition (MDD) is employed to cluster a large set of phosphorylation data into subgroups containing significantly conserved motifs. Profile hidden Markov model (HMM) is then applied to learn a predictive model for each subgroup. Cross-validation evaluation on the MDD-clustered HMMs yields an average accuracy of 82.4% for serine, 78.6% for threonine, and 89.0% for tyrosine models. Moreover, independent test results using <it>Arabidopsis thaliana </it>phosphorylation data from UniProtKB/Swiss-Prot show that the proposed models are able to correctly predict 81.4% phosphoserine, 77.1% phosphothreonine, and 83.7% phosphotyrosine sites. Interestingly, several MDD-clustered subgroups are observed to have similar amino acid conservation with the substrate motifs of well-known kinases from Phospho.ELM-a database containing kinase-specific phosphorylation data from multiple organisms.</p> <p>Conclusions</p> <p>This work presents a novel method for identifying plant phosphorylation sites with various substrate motifs. Based on cross-validation and independent testing, results show that the MDD-clustered models outperform models trained without using MDD. The proposed method has been implemented as a web-based plant phosphorylation prediction tool, PlantPhos <url>http://csb.cse.yzu.edu.tw/PlantPhos/</url>. Additionally, two case studies have been demonstrated to further evaluate the effectiveness of PlantPhos.</p>
first_indexed 2024-12-14T01:44:14Z
format Article
id doaj.art-cef32530c3dd407996ca88fcb1d98fcc
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-14T01:44:14Z
publishDate 2011-06-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-cef32530c3dd407996ca88fcb1d98fcc2022-12-21T23:21:37ZengBMCBMC Bioinformatics1471-21052011-06-0112126110.1186/1471-2105-12-261PlantPhos: using maximal dependence decomposition to identify plant phosphorylation sites with substrate site specificityBretaña NeilLee Tzong-YiLu Cheng-Tsung<p>Abstract</p> <p>Background</p> <p>Protein phosphorylation catalyzed by kinases plays crucial regulatory roles in intracellular signal transduction. Due to the difficulty in performing high-throughput mass spectrometry-based experiment, there is a desire to predict phosphorylation sites using computational methods. However, previous studies regarding <it>in silico </it>prediction of plant phosphorylation sites lack the consideration of kinase-specific phosphorylation data. Thus, we are motivated to propose a new method that investigates different substrate specificities in plant phosphorylation sites.</p> <p>Results</p> <p>Experimentally verified phosphorylation data were extracted from TAIR9-a protein database containing 3006 phosphorylation data from the plant species <it>Arabidopsis thaliana</it>. In an attempt to investigate the various substrate motifs in plant phosphorylation, maximal dependence decomposition (MDD) is employed to cluster a large set of phosphorylation data into subgroups containing significantly conserved motifs. Profile hidden Markov model (HMM) is then applied to learn a predictive model for each subgroup. Cross-validation evaluation on the MDD-clustered HMMs yields an average accuracy of 82.4% for serine, 78.6% for threonine, and 89.0% for tyrosine models. Moreover, independent test results using <it>Arabidopsis thaliana </it>phosphorylation data from UniProtKB/Swiss-Prot show that the proposed models are able to correctly predict 81.4% phosphoserine, 77.1% phosphothreonine, and 83.7% phosphotyrosine sites. Interestingly, several MDD-clustered subgroups are observed to have similar amino acid conservation with the substrate motifs of well-known kinases from Phospho.ELM-a database containing kinase-specific phosphorylation data from multiple organisms.</p> <p>Conclusions</p> <p>This work presents a novel method for identifying plant phosphorylation sites with various substrate motifs. Based on cross-validation and independent testing, results show that the MDD-clustered models outperform models trained without using MDD. The proposed method has been implemented as a web-based plant phosphorylation prediction tool, PlantPhos <url>http://csb.cse.yzu.edu.tw/PlantPhos/</url>. Additionally, two case studies have been demonstrated to further evaluate the effectiveness of PlantPhos.</p>http://www.biomedcentral.com/1471-2105/12/261
spellingShingle Bretaña Neil
Lee Tzong-Yi
Lu Cheng-Tsung
PlantPhos: using maximal dependence decomposition to identify plant phosphorylation sites with substrate site specificity
BMC Bioinformatics
title PlantPhos: using maximal dependence decomposition to identify plant phosphorylation sites with substrate site specificity
title_full PlantPhos: using maximal dependence decomposition to identify plant phosphorylation sites with substrate site specificity
title_fullStr PlantPhos: using maximal dependence decomposition to identify plant phosphorylation sites with substrate site specificity
title_full_unstemmed PlantPhos: using maximal dependence decomposition to identify plant phosphorylation sites with substrate site specificity
title_short PlantPhos: using maximal dependence decomposition to identify plant phosphorylation sites with substrate site specificity
title_sort plantphos using maximal dependence decomposition to identify plant phosphorylation sites with substrate site specificity
url http://www.biomedcentral.com/1471-2105/12/261
work_keys_str_mv AT bretananeil plantphosusingmaximaldependencedecompositiontoidentifyplantphosphorylationsiteswithsubstratesitespecificity
AT leetzongyi plantphosusingmaximaldependencedecompositiontoidentifyplantphosphorylationsiteswithsubstratesitespecificity
AT luchengtsung plantphosusingmaximaldependencedecompositiontoidentifyplantphosphorylationsiteswithsubstratesitespecificity