A Bayesian mixture modelling approach for spatial proteomics.
Analysis of the spatial sub-cellular distribution of proteins is of vital importance to fully understand context specific protein function. Some proteins can be found with a single location within a cell, but up to half of proteins may reside in multiple locations, can dynamically re-localise, or re...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2018-11-01
|
Series: | PLoS Computational Biology |
Online Access: | https://doi.org/10.1371/journal.pcbi.1006516 |
_version_ | 1798030201060327424 |
---|---|
author | Oliver M Crook Claire M Mulvey Paul D W Kirk Kathryn S Lilley Laurent Gatto |
author_facet | Oliver M Crook Claire M Mulvey Paul D W Kirk Kathryn S Lilley Laurent Gatto |
author_sort | Oliver M Crook |
collection | DOAJ |
description | Analysis of the spatial sub-cellular distribution of proteins is of vital importance to fully understand context specific protein function. Some proteins can be found with a single location within a cell, but up to half of proteins may reside in multiple locations, can dynamically re-localise, or reside within an unknown functional compartment. These considerations lead to uncertainty in associating a protein to a single location. Currently, mass spectrometry (MS) based spatial proteomics relies on supervised machine learning algorithms to assign proteins to sub-cellular locations based on common gradient profiles. However, such methods fail to quantify uncertainty associated with sub-cellular class assignment. Here we reformulate the framework on which we perform statistical analysis. We propose a Bayesian generative classifier based on Gaussian mixture models to assign proteins probabilistically to sub-cellular niches, thus proteins have a probability distribution over sub-cellular locations, with Bayesian computation performed using the expectation-maximisation (EM) algorithm, as well as Markov-chain Monte-Carlo (MCMC). Our methodology allows proteome-wide uncertainty quantification, thus adding a further layer to the analysis of spatial proteomics. Our framework is flexible, allowing many different systems to be analysed and reveals new modelling opportunities for spatial proteomics. We find our methods perform competitively with current state-of-the art machine learning methods, whilst simultaneously providing more information. We highlight several examples where classification based on the support vector machine is unable to make any conclusions, while uncertainty quantification using our approach provides biologically intriguing results. To our knowledge this is the first Bayesian model of MS-based spatial proteomics data. |
first_indexed | 2024-04-11T19:37:28Z |
format | Article |
id | doaj.art-989e62b1c5c24672a9efd1061f8a9b92 |
institution | Directory Open Access Journal |
issn | 1553-734X 1553-7358 |
language | English |
last_indexed | 2024-04-11T19:37:28Z |
publishDate | 2018-11-01 |
publisher | Public Library of Science (PLoS) |
record_format | Article |
series | PLoS Computational Biology |
spelling | doaj.art-989e62b1c5c24672a9efd1061f8a9b922022-12-22T04:06:49ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582018-11-011411e100651610.1371/journal.pcbi.1006516A Bayesian mixture modelling approach for spatial proteomics.Oliver M CrookClaire M MulveyPaul D W KirkKathryn S LilleyLaurent GattoAnalysis of the spatial sub-cellular distribution of proteins is of vital importance to fully understand context specific protein function. Some proteins can be found with a single location within a cell, but up to half of proteins may reside in multiple locations, can dynamically re-localise, or reside within an unknown functional compartment. These considerations lead to uncertainty in associating a protein to a single location. Currently, mass spectrometry (MS) based spatial proteomics relies on supervised machine learning algorithms to assign proteins to sub-cellular locations based on common gradient profiles. However, such methods fail to quantify uncertainty associated with sub-cellular class assignment. Here we reformulate the framework on which we perform statistical analysis. We propose a Bayesian generative classifier based on Gaussian mixture models to assign proteins probabilistically to sub-cellular niches, thus proteins have a probability distribution over sub-cellular locations, with Bayesian computation performed using the expectation-maximisation (EM) algorithm, as well as Markov-chain Monte-Carlo (MCMC). Our methodology allows proteome-wide uncertainty quantification, thus adding a further layer to the analysis of spatial proteomics. Our framework is flexible, allowing many different systems to be analysed and reveals new modelling opportunities for spatial proteomics. We find our methods perform competitively with current state-of-the art machine learning methods, whilst simultaneously providing more information. We highlight several examples where classification based on the support vector machine is unable to make any conclusions, while uncertainty quantification using our approach provides biologically intriguing results. To our knowledge this is the first Bayesian model of MS-based spatial proteomics data.https://doi.org/10.1371/journal.pcbi.1006516 |
spellingShingle | Oliver M Crook Claire M Mulvey Paul D W Kirk Kathryn S Lilley Laurent Gatto A Bayesian mixture modelling approach for spatial proteomics. PLoS Computational Biology |
title | A Bayesian mixture modelling approach for spatial proteomics. |
title_full | A Bayesian mixture modelling approach for spatial proteomics. |
title_fullStr | A Bayesian mixture modelling approach for spatial proteomics. |
title_full_unstemmed | A Bayesian mixture modelling approach for spatial proteomics. |
title_short | A Bayesian mixture modelling approach for spatial proteomics. |
title_sort | bayesian mixture modelling approach for spatial proteomics |
url | https://doi.org/10.1371/journal.pcbi.1006516 |
work_keys_str_mv | AT olivermcrook abayesianmixturemodellingapproachforspatialproteomics AT clairemmulvey abayesianmixturemodellingapproachforspatialproteomics AT pauldwkirk abayesianmixturemodellingapproachforspatialproteomics AT kathrynslilley abayesianmixturemodellingapproachforspatialproteomics AT laurentgatto abayesianmixturemodellingapproachforspatialproteomics AT olivermcrook bayesianmixturemodellingapproachforspatialproteomics AT clairemmulvey bayesianmixturemodellingapproachforspatialproteomics AT pauldwkirk bayesianmixturemodellingapproachforspatialproteomics AT kathrynslilley bayesianmixturemodellingapproachforspatialproteomics AT laurentgatto bayesianmixturemodellingapproachforspatialproteomics |