A Bayesian mixture modelling approach for spatial proteomics.

Analysis of the spatial sub-cellular distribution of proteins is of vital importance to fully understand context specific protein function. Some proteins can be found with a single location within a cell, but up to half of proteins may reside in multiple locations, can dynamically re-localise, or re...

Full description

Bibliographic Details
Main Authors: Oliver M Crook, Claire M Mulvey, Paul D W Kirk, Kathryn S Lilley, Laurent Gatto
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2018-11-01
Series:PLoS Computational Biology
Online Access:https://doi.org/10.1371/journal.pcbi.1006516
_version_ 1798030201060327424
author Oliver M Crook
Claire M Mulvey
Paul D W Kirk
Kathryn S Lilley
Laurent Gatto
author_facet Oliver M Crook
Claire M Mulvey
Paul D W Kirk
Kathryn S Lilley
Laurent Gatto
author_sort Oliver M Crook
collection DOAJ
description Analysis of the spatial sub-cellular distribution of proteins is of vital importance to fully understand context specific protein function. Some proteins can be found with a single location within a cell, but up to half of proteins may reside in multiple locations, can dynamically re-localise, or reside within an unknown functional compartment. These considerations lead to uncertainty in associating a protein to a single location. Currently, mass spectrometry (MS) based spatial proteomics relies on supervised machine learning algorithms to assign proteins to sub-cellular locations based on common gradient profiles. However, such methods fail to quantify uncertainty associated with sub-cellular class assignment. Here we reformulate the framework on which we perform statistical analysis. We propose a Bayesian generative classifier based on Gaussian mixture models to assign proteins probabilistically to sub-cellular niches, thus proteins have a probability distribution over sub-cellular locations, with Bayesian computation performed using the expectation-maximisation (EM) algorithm, as well as Markov-chain Monte-Carlo (MCMC). Our methodology allows proteome-wide uncertainty quantification, thus adding a further layer to the analysis of spatial proteomics. Our framework is flexible, allowing many different systems to be analysed and reveals new modelling opportunities for spatial proteomics. We find our methods perform competitively with current state-of-the art machine learning methods, whilst simultaneously providing more information. We highlight several examples where classification based on the support vector machine is unable to make any conclusions, while uncertainty quantification using our approach provides biologically intriguing results. To our knowledge this is the first Bayesian model of MS-based spatial proteomics data.
first_indexed 2024-04-11T19:37:28Z
format Article
id doaj.art-989e62b1c5c24672a9efd1061f8a9b92
institution Directory Open Access Journal
issn 1553-734X
1553-7358
language English
last_indexed 2024-04-11T19:37:28Z
publishDate 2018-11-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS Computational Biology
spelling doaj.art-989e62b1c5c24672a9efd1061f8a9b922022-12-22T04:06:49ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582018-11-011411e100651610.1371/journal.pcbi.1006516A Bayesian mixture modelling approach for spatial proteomics.Oliver M CrookClaire M MulveyPaul D W KirkKathryn S LilleyLaurent GattoAnalysis of the spatial sub-cellular distribution of proteins is of vital importance to fully understand context specific protein function. Some proteins can be found with a single location within a cell, but up to half of proteins may reside in multiple locations, can dynamically re-localise, or reside within an unknown functional compartment. These considerations lead to uncertainty in associating a protein to a single location. Currently, mass spectrometry (MS) based spatial proteomics relies on supervised machine learning algorithms to assign proteins to sub-cellular locations based on common gradient profiles. However, such methods fail to quantify uncertainty associated with sub-cellular class assignment. Here we reformulate the framework on which we perform statistical analysis. We propose a Bayesian generative classifier based on Gaussian mixture models to assign proteins probabilistically to sub-cellular niches, thus proteins have a probability distribution over sub-cellular locations, with Bayesian computation performed using the expectation-maximisation (EM) algorithm, as well as Markov-chain Monte-Carlo (MCMC). Our methodology allows proteome-wide uncertainty quantification, thus adding a further layer to the analysis of spatial proteomics. Our framework is flexible, allowing many different systems to be analysed and reveals new modelling opportunities for spatial proteomics. We find our methods perform competitively with current state-of-the art machine learning methods, whilst simultaneously providing more information. We highlight several examples where classification based on the support vector machine is unable to make any conclusions, while uncertainty quantification using our approach provides biologically intriguing results. To our knowledge this is the first Bayesian model of MS-based spatial proteomics data.https://doi.org/10.1371/journal.pcbi.1006516
spellingShingle Oliver M Crook
Claire M Mulvey
Paul D W Kirk
Kathryn S Lilley
Laurent Gatto
A Bayesian mixture modelling approach for spatial proteomics.
PLoS Computational Biology
title A Bayesian mixture modelling approach for spatial proteomics.
title_full A Bayesian mixture modelling approach for spatial proteomics.
title_fullStr A Bayesian mixture modelling approach for spatial proteomics.
title_full_unstemmed A Bayesian mixture modelling approach for spatial proteomics.
title_short A Bayesian mixture modelling approach for spatial proteomics.
title_sort bayesian mixture modelling approach for spatial proteomics
url https://doi.org/10.1371/journal.pcbi.1006516
work_keys_str_mv AT olivermcrook abayesianmixturemodellingapproachforspatialproteomics
AT clairemmulvey abayesianmixturemodellingapproachforspatialproteomics
AT pauldwkirk abayesianmixturemodellingapproachforspatialproteomics
AT kathrynslilley abayesianmixturemodellingapproachforspatialproteomics
AT laurentgatto abayesianmixturemodellingapproachforspatialproteomics
AT olivermcrook bayesianmixturemodellingapproachforspatialproteomics
AT clairemmulvey bayesianmixturemodellingapproachforspatialproteomics
AT pauldwkirk bayesianmixturemodellingapproachforspatialproteomics
AT kathrynslilley bayesianmixturemodellingapproachforspatialproteomics
AT laurentgatto bayesianmixturemodellingapproachforspatialproteomics