A semi-supervised Bayesian approach for simultaneous protein sub-cellular localisation assignment and novelty detection.

The cell is compartmentalised into complex micro-environments allowing an array of specialised biological processes to be carried out in synchrony. Determining a protein's sub-cellular localisation to one or more of these compartments can therefore be a first step in determining its function. H...

Full description

Bibliographic Details
Main Authors: Oliver M Crook, Aikaterini Geladaki, Daniel J H Nightingale, Owen L Vennard, Kathryn S Lilley, Laurent Gatto, Paul D W Kirk
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2020-11-01
Series:PLoS Computational Biology
Online Access:https://doi.org/10.1371/journal.pcbi.1008288
_version_ 1830192952750637056
author Oliver M Crook
Aikaterini Geladaki
Daniel J H Nightingale
Owen L Vennard
Kathryn S Lilley
Laurent Gatto
Paul D W Kirk
author_facet Oliver M Crook
Aikaterini Geladaki
Daniel J H Nightingale
Owen L Vennard
Kathryn S Lilley
Laurent Gatto
Paul D W Kirk
author_sort Oliver M Crook
collection DOAJ
description The cell is compartmentalised into complex micro-environments allowing an array of specialised biological processes to be carried out in synchrony. Determining a protein's sub-cellular localisation to one or more of these compartments can therefore be a first step in determining its function. High-throughput and high-accuracy mass spectrometry-based sub-cellular proteomic methods can now shed light on the localisation of thousands of proteins at once. Machine learning algorithms are then typically employed to make protein-organelle assignments. However, these algorithms are limited by insufficient and incomplete annotation. We propose a semi-supervised Bayesian approach to novelty detection, allowing the discovery of additional, previously unannotated sub-cellular niches. Inference in our model is performed in a Bayesian framework, allowing us to quantify uncertainty in the allocation of proteins to new sub-cellular niches, as well as in the number of newly discovered compartments. We apply our approach across 10 mass spectrometry based spatial proteomic datasets, representing a diverse range of experimental protocols. Application of our approach to hyperLOPIT datasets validates its utility by recovering enrichment with chromatin-associated proteins without annotation and uncovers sub-nuclear compartmentalisation which was not identified in the original analysis. Moreover, using sub-cellular proteomics data from Saccharomyces cerevisiae, we uncover a novel group of proteins trafficking from the ER to the early Golgi apparatus. Overall, we demonstrate the potential for novelty detection to yield biologically relevant niches that are missed by current approaches.
first_indexed 2024-12-18T00:04:44Z
format Article
id doaj.art-be2f8d51ad0b455b86c3fa544d4a23a9
institution Directory Open Access Journal
issn 1553-734X
1553-7358
language English
last_indexed 2024-12-18T00:04:44Z
publishDate 2020-11-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS Computational Biology
spelling doaj.art-be2f8d51ad0b455b86c3fa544d4a23a92022-12-21T21:27:51ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582020-11-011611e100828810.1371/journal.pcbi.1008288A semi-supervised Bayesian approach for simultaneous protein sub-cellular localisation assignment and novelty detection.Oliver M CrookAikaterini GeladakiDaniel J H NightingaleOwen L VennardKathryn S LilleyLaurent GattoPaul D W KirkThe cell is compartmentalised into complex micro-environments allowing an array of specialised biological processes to be carried out in synchrony. Determining a protein's sub-cellular localisation to one or more of these compartments can therefore be a first step in determining its function. High-throughput and high-accuracy mass spectrometry-based sub-cellular proteomic methods can now shed light on the localisation of thousands of proteins at once. Machine learning algorithms are then typically employed to make protein-organelle assignments. However, these algorithms are limited by insufficient and incomplete annotation. We propose a semi-supervised Bayesian approach to novelty detection, allowing the discovery of additional, previously unannotated sub-cellular niches. Inference in our model is performed in a Bayesian framework, allowing us to quantify uncertainty in the allocation of proteins to new sub-cellular niches, as well as in the number of newly discovered compartments. We apply our approach across 10 mass spectrometry based spatial proteomic datasets, representing a diverse range of experimental protocols. Application of our approach to hyperLOPIT datasets validates its utility by recovering enrichment with chromatin-associated proteins without annotation and uncovers sub-nuclear compartmentalisation which was not identified in the original analysis. Moreover, using sub-cellular proteomics data from Saccharomyces cerevisiae, we uncover a novel group of proteins trafficking from the ER to the early Golgi apparatus. Overall, we demonstrate the potential for novelty detection to yield biologically relevant niches that are missed by current approaches.https://doi.org/10.1371/journal.pcbi.1008288
spellingShingle Oliver M Crook
Aikaterini Geladaki
Daniel J H Nightingale
Owen L Vennard
Kathryn S Lilley
Laurent Gatto
Paul D W Kirk
A semi-supervised Bayesian approach for simultaneous protein sub-cellular localisation assignment and novelty detection.
PLoS Computational Biology
title A semi-supervised Bayesian approach for simultaneous protein sub-cellular localisation assignment and novelty detection.
title_full A semi-supervised Bayesian approach for simultaneous protein sub-cellular localisation assignment and novelty detection.
title_fullStr A semi-supervised Bayesian approach for simultaneous protein sub-cellular localisation assignment and novelty detection.
title_full_unstemmed A semi-supervised Bayesian approach for simultaneous protein sub-cellular localisation assignment and novelty detection.
title_short A semi-supervised Bayesian approach for simultaneous protein sub-cellular localisation assignment and novelty detection.
title_sort semi supervised bayesian approach for simultaneous protein sub cellular localisation assignment and novelty detection
url https://doi.org/10.1371/journal.pcbi.1008288
work_keys_str_mv AT olivermcrook asemisupervisedbayesianapproachforsimultaneousproteinsubcellularlocalisationassignmentandnoveltydetection
AT aikaterinigeladaki asemisupervisedbayesianapproachforsimultaneousproteinsubcellularlocalisationassignmentandnoveltydetection
AT danieljhnightingale asemisupervisedbayesianapproachforsimultaneousproteinsubcellularlocalisationassignmentandnoveltydetection
AT owenlvennard asemisupervisedbayesianapproachforsimultaneousproteinsubcellularlocalisationassignmentandnoveltydetection
AT kathrynslilley asemisupervisedbayesianapproachforsimultaneousproteinsubcellularlocalisationassignmentandnoveltydetection
AT laurentgatto asemisupervisedbayesianapproachforsimultaneousproteinsubcellularlocalisationassignmentandnoveltydetection
AT pauldwkirk asemisupervisedbayesianapproachforsimultaneousproteinsubcellularlocalisationassignmentandnoveltydetection
AT olivermcrook semisupervisedbayesianapproachforsimultaneousproteinsubcellularlocalisationassignmentandnoveltydetection
AT aikaterinigeladaki semisupervisedbayesianapproachforsimultaneousproteinsubcellularlocalisationassignmentandnoveltydetection
AT danieljhnightingale semisupervisedbayesianapproachforsimultaneousproteinsubcellularlocalisationassignmentandnoveltydetection
AT owenlvennard semisupervisedbayesianapproachforsimultaneousproteinsubcellularlocalisationassignmentandnoveltydetection
AT kathrynslilley semisupervisedbayesianapproachforsimultaneousproteinsubcellularlocalisationassignmentandnoveltydetection
AT laurentgatto semisupervisedbayesianapproachforsimultaneousproteinsubcellularlocalisationassignmentandnoveltydetection
AT pauldwkirk semisupervisedbayesianapproachforsimultaneousproteinsubcellularlocalisationassignmentandnoveltydetection