Measuring pathway database coverage of the phosphoproteome

Protein phosphorylation is one of the best known post-translational mechanisms playing a key role in the regulation of cellular processes. Over 100,000 distinct phosphorylation sites have been discovered through constant improvement of mass spectrometry based phosphoproteomics in the last decade. Ho...

Full description

Bibliographic Details
Main Authors: Hannah Huckstep, Liam G. Fearnley, Melissa J. Davis
Format: Article
Language:English
Published: PeerJ Inc. 2021-05-01
Series:PeerJ
Subjects:
Online Access:https://peerj.com/articles/11298.pdf
_version_ 1827607508527611904
author Hannah Huckstep
Liam G. Fearnley
Melissa J. Davis
author_facet Hannah Huckstep
Liam G. Fearnley
Melissa J. Davis
author_sort Hannah Huckstep
collection DOAJ
description Protein phosphorylation is one of the best known post-translational mechanisms playing a key role in the regulation of cellular processes. Over 100,000 distinct phosphorylation sites have been discovered through constant improvement of mass spectrometry based phosphoproteomics in the last decade. However, data saturation is occurring and the bottleneck of assigning biologically relevant functionality to phosphosites needs to be addressed. There has been finite success in using data-driven approaches to reveal phosphosite functionality due to a range of limitations. The alternate, more suitable approach is making use of prior knowledge from literature-derived databases. Here, we analysed seven widely used databases to shed light on their suitability to provide functional insights into phosphoproteomics data. We first determined the global coverage of each database at both the protein and phosphosite level. We also determined how consistent each database was in its phosphorylation annotations compared to a global standard. Finally, we looked in detail at the coverage of each database over six experimental datasets. Our analysis highlights the relative strengths and weaknesses of each database, providing a guide in how each can be best used to identify biological mechanisms in phosphoproteomic data.
first_indexed 2024-03-09T06:55:13Z
format Article
id doaj.art-7bfb676172584c3c93475e5fcd94713e
institution Directory Open Access Journal
issn 2167-8359
language English
last_indexed 2024-03-09T06:55:13Z
publishDate 2021-05-01
publisher PeerJ Inc.
record_format Article
series PeerJ
spelling doaj.art-7bfb676172584c3c93475e5fcd94713e2023-12-03T10:05:33ZengPeerJ Inc.PeerJ2167-83592021-05-019e1129810.7717/peerj.11298Measuring pathway database coverage of the phosphoproteomeHannah Huckstep0Liam G. Fearnley1Melissa J. Davis2Division of Bioinformatics, The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, AustraliaDepartment of Medical Biology, Faculty of Medicine, Dentistry and Health Sciences, University of Melbourne, Parkville, Victoria, AustraliaDivision of Bioinformatics, The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, AustraliaProtein phosphorylation is one of the best known post-translational mechanisms playing a key role in the regulation of cellular processes. Over 100,000 distinct phosphorylation sites have been discovered through constant improvement of mass spectrometry based phosphoproteomics in the last decade. However, data saturation is occurring and the bottleneck of assigning biologically relevant functionality to phosphosites needs to be addressed. There has been finite success in using data-driven approaches to reveal phosphosite functionality due to a range of limitations. The alternate, more suitable approach is making use of prior knowledge from literature-derived databases. Here, we analysed seven widely used databases to shed light on their suitability to provide functional insights into phosphoproteomics data. We first determined the global coverage of each database at both the protein and phosphosite level. We also determined how consistent each database was in its phosphorylation annotations compared to a global standard. Finally, we looked in detail at the coverage of each database over six experimental datasets. Our analysis highlights the relative strengths and weaknesses of each database, providing a guide in how each can be best used to identify biological mechanisms in phosphoproteomic data.https://peerj.com/articles/11298.pdfPhosphoproteomicsDatabasesProteomicsBioinformatics
spellingShingle Hannah Huckstep
Liam G. Fearnley
Melissa J. Davis
Measuring pathway database coverage of the phosphoproteome
PeerJ
Phosphoproteomics
Databases
Proteomics
Bioinformatics
title Measuring pathway database coverage of the phosphoproteome
title_full Measuring pathway database coverage of the phosphoproteome
title_fullStr Measuring pathway database coverage of the phosphoproteome
title_full_unstemmed Measuring pathway database coverage of the phosphoproteome
title_short Measuring pathway database coverage of the phosphoproteome
title_sort measuring pathway database coverage of the phosphoproteome
topic Phosphoproteomics
Databases
Proteomics
Bioinformatics
url https://peerj.com/articles/11298.pdf
work_keys_str_mv AT hannahhuckstep measuringpathwaydatabasecoverageofthephosphoproteome
AT liamgfearnley measuringpathwaydatabasecoverageofthephosphoproteome
AT melissajdavis measuringpathwaydatabasecoverageofthephosphoproteome