Evaluation of linear models and missing value imputation for the analysis of peptide-centric proteomics

Abstract Background Several methods to handle data generated from bottom-up proteomics via liquid chromatography-mass spectrometry, particularly for peptide-centric quantification dealing with post-translational modification (PTM) analysis like reversible cysteine oxidation are evaluated. The paper...

Full description

Bibliographic Details
Main Authors: Philip Berg, Evan W. McConnell, Leslie M. Hicks, Sorina C. Popescu, George V. Popescu
Format: Article
Language:English
Published: BMC 2019-03-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-019-2619-6
_version_ 1818251787870666752
author Philip Berg
Evan W. McConnell
Leslie M. Hicks
Sorina C. Popescu
George V. Popescu
author_facet Philip Berg
Evan W. McConnell
Leslie M. Hicks
Sorina C. Popescu
George V. Popescu
author_sort Philip Berg
collection DOAJ
description Abstract Background Several methods to handle data generated from bottom-up proteomics via liquid chromatography-mass spectrometry, particularly for peptide-centric quantification dealing with post-translational modification (PTM) analysis like reversible cysteine oxidation are evaluated. The paper proposes a pipeline based on the R programming language to analyze PTMs from peptide-centric label-free quantitative proteomics data. Results Our methodology includes variance stabilization, normalization, and missing data imputation to account for the large dynamic range of PTM measurements. It also corrects biases from an enrichment protocol and reduces the random and systematic errors associated with label-free quantification. The performance of the methodology is tested by performing proteome-wide differential PTM quantitation using linear models analysis (limma). We objectively compare two imputation methods along with significance testing when using multiple-imputation for missing data. Conclusion Identifying PTMs in large-scale datasets is a problem with distinct characteristics that require new methods for handling missing data imputation and differential proteome analysis. Linear models in combination with multiple-imputation could significantly outperform a t-test-based decision method.
first_indexed 2024-12-12T16:13:50Z
format Article
id doaj.art-1e7ef4188c3649d0af4169d4aa58b3d0
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-12T16:13:50Z
publishDate 2019-03-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-1e7ef4188c3649d0af4169d4aa58b3d02022-12-22T00:19:08ZengBMCBMC Bioinformatics1471-21052019-03-0120S271610.1186/s12859-019-2619-6Evaluation of linear models and missing value imputation for the analysis of peptide-centric proteomicsPhilip Berg0Evan W. McConnell1Leslie M. Hicks2Sorina C. Popescu3George V. Popescu4Department of Biochemistry, Molecular Biology, Entomology, and Plant Pathology, Mississippi State UniversityDepartment of Chemistry, University of North Carolina at Chapel HillDepartment of Chemistry, University of North Carolina at Chapel HillDepartment of Biochemistry, Molecular Biology, Entomology, and Plant Pathology, Mississippi State UniversityInstitute for Genomics, Biocomputing and Biotechnology, Mississippi State UniversityAbstract Background Several methods to handle data generated from bottom-up proteomics via liquid chromatography-mass spectrometry, particularly for peptide-centric quantification dealing with post-translational modification (PTM) analysis like reversible cysteine oxidation are evaluated. The paper proposes a pipeline based on the R programming language to analyze PTMs from peptide-centric label-free quantitative proteomics data. Results Our methodology includes variance stabilization, normalization, and missing data imputation to account for the large dynamic range of PTM measurements. It also corrects biases from an enrichment protocol and reduces the random and systematic errors associated with label-free quantification. The performance of the methodology is tested by performing proteome-wide differential PTM quantitation using linear models analysis (limma). We objectively compare two imputation methods along with significance testing when using multiple-imputation for missing data. Conclusion Identifying PTMs in large-scale datasets is a problem with distinct characteristics that require new methods for handling missing data imputation and differential proteome analysis. Linear models in combination with multiple-imputation could significantly outperform a t-test-based decision method.http://link.springer.com/article/10.1186/s12859-019-2619-6Post-translational modificationsRedox proteomeMass spectrometryMultiple imputationLinear regression models
spellingShingle Philip Berg
Evan W. McConnell
Leslie M. Hicks
Sorina C. Popescu
George V. Popescu
Evaluation of linear models and missing value imputation for the analysis of peptide-centric proteomics
BMC Bioinformatics
Post-translational modifications
Redox proteome
Mass spectrometry
Multiple imputation
Linear regression models
title Evaluation of linear models and missing value imputation for the analysis of peptide-centric proteomics
title_full Evaluation of linear models and missing value imputation for the analysis of peptide-centric proteomics
title_fullStr Evaluation of linear models and missing value imputation for the analysis of peptide-centric proteomics
title_full_unstemmed Evaluation of linear models and missing value imputation for the analysis of peptide-centric proteomics
title_short Evaluation of linear models and missing value imputation for the analysis of peptide-centric proteomics
title_sort evaluation of linear models and missing value imputation for the analysis of peptide centric proteomics
topic Post-translational modifications
Redox proteome
Mass spectrometry
Multiple imputation
Linear regression models
url http://link.springer.com/article/10.1186/s12859-019-2619-6
work_keys_str_mv AT philipberg evaluationoflinearmodelsandmissingvalueimputationfortheanalysisofpeptidecentricproteomics
AT evanwmcconnell evaluationoflinearmodelsandmissingvalueimputationfortheanalysisofpeptidecentricproteomics
AT lesliemhicks evaluationoflinearmodelsandmissingvalueimputationfortheanalysisofpeptidecentricproteomics
AT sorinacpopescu evaluationoflinearmodelsandmissingvalueimputationfortheanalysisofpeptidecentricproteomics
AT georgevpopescu evaluationoflinearmodelsandmissingvalueimputationfortheanalysisofpeptidecentricproteomics