Accounting for multiple imputation-induced variability for differential analysis in mass spectrometry-based label-free quantitative proteomics.

Imputing missing values is common practice in label-free quantitative proteomics. Imputation aims at replacing a missing value with a user-defined one. However, the imputation itself may not be optimally considered downstream of the imputation process, as imputed datasets are often considered as if...

Full description

Bibliographic Details
Main Authors: Marie Chion, Christine Carapito, Frédéric Bertrand
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2022-08-01
Series:PLoS Computational Biology
Online Access:https://doi.org/10.1371/journal.pcbi.1010420
_version_ 1828110515273990144
author Marie Chion
Christine Carapito
Frédéric Bertrand
author_facet Marie Chion
Christine Carapito
Frédéric Bertrand
author_sort Marie Chion
collection DOAJ
description Imputing missing values is common practice in label-free quantitative proteomics. Imputation aims at replacing a missing value with a user-defined one. However, the imputation itself may not be optimally considered downstream of the imputation process, as imputed datasets are often considered as if they had always been complete. Hence, the uncertainty due to the imputation is not adequately taken into account. We provide a rigorous multiple imputation strategy, leading to a less biased estimation of the parameters' variability thanks to Rubin's rules. The imputation-based peptide's intensities' variance estimator is then moderated using Bayesian hierarchical models. This estimator is finally included in moderated t-test statistics to provide differential analyses results. This workflow can be used both at peptide and protein-level in quantification datasets. Indeed, an aggregation step is included for protein-level results based on peptide-level quantification data. Our methodology, named mi4p, was compared to the state-of-the-art limma workflow implemented in the DAPAR R package, both on simulated and real datasets. We observed a trade-off between sensitivity and specificity, while the overall performance of mi4p outperforms DAPAR in terms of F-Score.
first_indexed 2024-04-11T11:20:18Z
format Article
id doaj.art-291ee1a22e624efab92f8e83695ef083
institution Directory Open Access Journal
issn 1553-734X
1553-7358
language English
last_indexed 2024-04-11T11:20:18Z
publishDate 2022-08-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS Computational Biology
spelling doaj.art-291ee1a22e624efab92f8e83695ef0832022-12-22T04:27:06ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582022-08-01188e101042010.1371/journal.pcbi.1010420Accounting for multiple imputation-induced variability for differential analysis in mass spectrometry-based label-free quantitative proteomics.Marie ChionChristine CarapitoFrédéric BertrandImputing missing values is common practice in label-free quantitative proteomics. Imputation aims at replacing a missing value with a user-defined one. However, the imputation itself may not be optimally considered downstream of the imputation process, as imputed datasets are often considered as if they had always been complete. Hence, the uncertainty due to the imputation is not adequately taken into account. We provide a rigorous multiple imputation strategy, leading to a less biased estimation of the parameters' variability thanks to Rubin's rules. The imputation-based peptide's intensities' variance estimator is then moderated using Bayesian hierarchical models. This estimator is finally included in moderated t-test statistics to provide differential analyses results. This workflow can be used both at peptide and protein-level in quantification datasets. Indeed, an aggregation step is included for protein-level results based on peptide-level quantification data. Our methodology, named mi4p, was compared to the state-of-the-art limma workflow implemented in the DAPAR R package, both on simulated and real datasets. We observed a trade-off between sensitivity and specificity, while the overall performance of mi4p outperforms DAPAR in terms of F-Score.https://doi.org/10.1371/journal.pcbi.1010420
spellingShingle Marie Chion
Christine Carapito
Frédéric Bertrand
Accounting for multiple imputation-induced variability for differential analysis in mass spectrometry-based label-free quantitative proteomics.
PLoS Computational Biology
title Accounting for multiple imputation-induced variability for differential analysis in mass spectrometry-based label-free quantitative proteomics.
title_full Accounting for multiple imputation-induced variability for differential analysis in mass spectrometry-based label-free quantitative proteomics.
title_fullStr Accounting for multiple imputation-induced variability for differential analysis in mass spectrometry-based label-free quantitative proteomics.
title_full_unstemmed Accounting for multiple imputation-induced variability for differential analysis in mass spectrometry-based label-free quantitative proteomics.
title_short Accounting for multiple imputation-induced variability for differential analysis in mass spectrometry-based label-free quantitative proteomics.
title_sort accounting for multiple imputation induced variability for differential analysis in mass spectrometry based label free quantitative proteomics
url https://doi.org/10.1371/journal.pcbi.1010420
work_keys_str_mv AT mariechion accountingformultipleimputationinducedvariabilityfordifferentialanalysisinmassspectrometrybasedlabelfreequantitativeproteomics
AT christinecarapito accountingformultipleimputationinducedvariabilityfordifferentialanalysisinmassspectrometrybasedlabelfreequantitativeproteomics
AT fredericbertrand accountingformultipleimputationinducedvariabilityfordifferentialanalysisinmassspectrometrybasedlabelfreequantitativeproteomics