Transfer posterior error probability estimation for peptide identification

Abstract Background In shotgun proteomics, database searching of tandem mass spectra results in a great number of peptide-spectrum matches (PSMs), many of which are false positives. Quality control of PSMs is a multiple hypothesis testing problem, and the false discovery rate (FDR) or the posterior...

Full description

Bibliographic Details
Main Authors:	Xinpei Yi, Fuzhou Gong, Yan Fu
Format:	Article
Language:	English
Published:	BMC 2020-05-01
Series:	BMC Bioinformatics
Subjects:	Proteomics Mass spectrometry Quality control Posterior error probability Local false discovery rate Transfer learning
Online Access:	http://link.springer.com/article/10.1186/s12859-020-3485-y

_version_	1811322807786668032
author	Xinpei Yi Fuzhou Gong Yan Fu
author_facet	Xinpei Yi Fuzhou Gong Yan Fu
author_sort	Xinpei Yi
collection	DOAJ
description	Abstract Background In shotgun proteomics, database searching of tandem mass spectra results in a great number of peptide-spectrum matches (PSMs), many of which are false positives. Quality control of PSMs is a multiple hypothesis testing problem, and the false discovery rate (FDR) or the posterior error probability (PEP) is the commonly used statistical confidence measure. PEP, also called local FDR, can evaluate the confidence of individual PSMs and thus is more desirable than FDR, which evaluates the global confidence of a collection of PSMs. Estimation of PEP can be achieved by decomposing the null and alternative distributions of PSM scores as long as the given data is sufficient. However, in many proteomic studies, only a group (subset) of PSMs, e.g. those with specific post-translational modifications, are of interest. The group can be very small, making the direct PEP estimation by the group data inaccurate, especially for the high-score area where the score threshold is taken. Using the whole set of PSMs to estimate the group PEP is inappropriate either, because the null and/or alternative distributions of the group can be very different from those of combined scores. Results The transfer PEP algorithm is proposed to more accurately estimate the PEPs of peptide identifications in small groups. Transfer PEP derives the group null distribution through its empirical relationship with the combined null distribution, and estimates the group alternative distribution, as well as the null proportion, using an iterative semi-parametric method. Validated on both simulated data and real proteomic data, transfer PEP showed remarkably higher accuracy than the direct combined and separate PEP estimation methods. Conclusions We presented a novel approach to group PEP estimation for small groups and implemented it for the peptide identification problem in proteomics. The methodology of the approach is in principle applicable to the small-group PEP estimation problems in other fields.
first_indexed	2024-04-13T13:42:52Z
format	Article
id	doaj.art-b72063daeae440b1a1d3fb6a27900210
institution	Directory Open Access Journal
issn	1471-2105
language	English
last_indexed	2024-04-13T13:42:52Z
publishDate	2020-05-01
publisher	BMC
record_format	Article
series	BMC Bioinformatics
spelling	doaj.art-b72063daeae440b1a1d3fb6a279002102022-12-22T02:44:35ZengBMCBMC Bioinformatics1471-21052020-05-0121111710.1186/s12859-020-3485-yTransfer posterior error probability estimation for peptide identificationXinpei Yi0Fuzhou Gong1Yan Fu2National Center for Mathematics and Interdisciplinary Sciences, Key Laboratory of Random Complex Structures and Data Science, Academy of Mathematics and Systems Science, Chinese Academy of SciencesNational Center for Mathematics and Interdisciplinary Sciences, Key Laboratory of Random Complex Structures and Data Science, Academy of Mathematics and Systems Science, Chinese Academy of SciencesNational Center for Mathematics and Interdisciplinary Sciences, Key Laboratory of Random Complex Structures and Data Science, Academy of Mathematics and Systems Science, Chinese Academy of SciencesAbstract Background In shotgun proteomics, database searching of tandem mass spectra results in a great number of peptide-spectrum matches (PSMs), many of which are false positives. Quality control of PSMs is a multiple hypothesis testing problem, and the false discovery rate (FDR) or the posterior error probability (PEP) is the commonly used statistical confidence measure. PEP, also called local FDR, can evaluate the confidence of individual PSMs and thus is more desirable than FDR, which evaluates the global confidence of a collection of PSMs. Estimation of PEP can be achieved by decomposing the null and alternative distributions of PSM scores as long as the given data is sufficient. However, in many proteomic studies, only a group (subset) of PSMs, e.g. those with specific post-translational modifications, are of interest. The group can be very small, making the direct PEP estimation by the group data inaccurate, especially for the high-score area where the score threshold is taken. Using the whole set of PSMs to estimate the group PEP is inappropriate either, because the null and/or alternative distributions of the group can be very different from those of combined scores. Results The transfer PEP algorithm is proposed to more accurately estimate the PEPs of peptide identifications in small groups. Transfer PEP derives the group null distribution through its empirical relationship with the combined null distribution, and estimates the group alternative distribution, as well as the null proportion, using an iterative semi-parametric method. Validated on both simulated data and real proteomic data, transfer PEP showed remarkably higher accuracy than the direct combined and separate PEP estimation methods. Conclusions We presented a novel approach to group PEP estimation for small groups and implemented it for the peptide identification problem in proteomics. The methodology of the approach is in principle applicable to the small-group PEP estimation problems in other fields.http://link.springer.com/article/10.1186/s12859-020-3485-yProteomicsMass spectrometryQuality controlPosterior error probabilityLocal false discovery rateTransfer learning
spellingShingle	Xinpei Yi Fuzhou Gong Yan Fu Transfer posterior error probability estimation for peptide identification BMC Bioinformatics Proteomics Mass spectrometry Quality control Posterior error probability Local false discovery rate Transfer learning
title	Transfer posterior error probability estimation for peptide identification
title_full	Transfer posterior error probability estimation for peptide identification
title_fullStr	Transfer posterior error probability estimation for peptide identification
title_full_unstemmed	Transfer posterior error probability estimation for peptide identification
title_short	Transfer posterior error probability estimation for peptide identification
title_sort	transfer posterior error probability estimation for peptide identification
topic	Proteomics Mass spectrometry Quality control Posterior error probability Local false discovery rate Transfer learning
url	http://link.springer.com/article/10.1186/s12859-020-3485-y
work_keys_str_mv	AT xinpeiyi transferposteriorerrorprobabilityestimationforpeptideidentification AT fuzhougong transferposteriorerrorprobabilityestimationforpeptideidentification AT yanfu transferposteriorerrorprobabilityestimationforpeptideidentification

Transfer posterior error probability estimation for peptide identification

Similar Items