A semi-supervised Bayesian approach for simultaneous protein sub-cellular localisation assignment and novelty detection.
The cell is compartmentalised into complex micro-environments allowing an array of specialised biological processes to be carried out in synchrony. Determining a protein's sub-cellular localisation to one or more of these compartments can therefore be a first step in determining its function. H...
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2020-11-01
|
Series: | PLoS Computational Biology |
Online Access: | https://doi.org/10.1371/journal.pcbi.1008288 |
_version_ | 1830192952750637056 |
---|---|
author | Oliver M Crook Aikaterini Geladaki Daniel J H Nightingale Owen L Vennard Kathryn S Lilley Laurent Gatto Paul D W Kirk |
author_facet | Oliver M Crook Aikaterini Geladaki Daniel J H Nightingale Owen L Vennard Kathryn S Lilley Laurent Gatto Paul D W Kirk |
author_sort | Oliver M Crook |
collection | DOAJ |
description | The cell is compartmentalised into complex micro-environments allowing an array of specialised biological processes to be carried out in synchrony. Determining a protein's sub-cellular localisation to one or more of these compartments can therefore be a first step in determining its function. High-throughput and high-accuracy mass spectrometry-based sub-cellular proteomic methods can now shed light on the localisation of thousands of proteins at once. Machine learning algorithms are then typically employed to make protein-organelle assignments. However, these algorithms are limited by insufficient and incomplete annotation. We propose a semi-supervised Bayesian approach to novelty detection, allowing the discovery of additional, previously unannotated sub-cellular niches. Inference in our model is performed in a Bayesian framework, allowing us to quantify uncertainty in the allocation of proteins to new sub-cellular niches, as well as in the number of newly discovered compartments. We apply our approach across 10 mass spectrometry based spatial proteomic datasets, representing a diverse range of experimental protocols. Application of our approach to hyperLOPIT datasets validates its utility by recovering enrichment with chromatin-associated proteins without annotation and uncovers sub-nuclear compartmentalisation which was not identified in the original analysis. Moreover, using sub-cellular proteomics data from Saccharomyces cerevisiae, we uncover a novel group of proteins trafficking from the ER to the early Golgi apparatus. Overall, we demonstrate the potential for novelty detection to yield biologically relevant niches that are missed by current approaches. |
first_indexed | 2024-12-18T00:04:44Z |
format | Article |
id | doaj.art-be2f8d51ad0b455b86c3fa544d4a23a9 |
institution | Directory Open Access Journal |
issn | 1553-734X 1553-7358 |
language | English |
last_indexed | 2024-12-18T00:04:44Z |
publishDate | 2020-11-01 |
publisher | Public Library of Science (PLoS) |
record_format | Article |
series | PLoS Computational Biology |
spelling | doaj.art-be2f8d51ad0b455b86c3fa544d4a23a92022-12-21T21:27:51ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582020-11-011611e100828810.1371/journal.pcbi.1008288A semi-supervised Bayesian approach for simultaneous protein sub-cellular localisation assignment and novelty detection.Oliver M CrookAikaterini GeladakiDaniel J H NightingaleOwen L VennardKathryn S LilleyLaurent GattoPaul D W KirkThe cell is compartmentalised into complex micro-environments allowing an array of specialised biological processes to be carried out in synchrony. Determining a protein's sub-cellular localisation to one or more of these compartments can therefore be a first step in determining its function. High-throughput and high-accuracy mass spectrometry-based sub-cellular proteomic methods can now shed light on the localisation of thousands of proteins at once. Machine learning algorithms are then typically employed to make protein-organelle assignments. However, these algorithms are limited by insufficient and incomplete annotation. We propose a semi-supervised Bayesian approach to novelty detection, allowing the discovery of additional, previously unannotated sub-cellular niches. Inference in our model is performed in a Bayesian framework, allowing us to quantify uncertainty in the allocation of proteins to new sub-cellular niches, as well as in the number of newly discovered compartments. We apply our approach across 10 mass spectrometry based spatial proteomic datasets, representing a diverse range of experimental protocols. Application of our approach to hyperLOPIT datasets validates its utility by recovering enrichment with chromatin-associated proteins without annotation and uncovers sub-nuclear compartmentalisation which was not identified in the original analysis. Moreover, using sub-cellular proteomics data from Saccharomyces cerevisiae, we uncover a novel group of proteins trafficking from the ER to the early Golgi apparatus. Overall, we demonstrate the potential for novelty detection to yield biologically relevant niches that are missed by current approaches.https://doi.org/10.1371/journal.pcbi.1008288 |
spellingShingle | Oliver M Crook Aikaterini Geladaki Daniel J H Nightingale Owen L Vennard Kathryn S Lilley Laurent Gatto Paul D W Kirk A semi-supervised Bayesian approach for simultaneous protein sub-cellular localisation assignment and novelty detection. PLoS Computational Biology |
title | A semi-supervised Bayesian approach for simultaneous protein sub-cellular localisation assignment and novelty detection. |
title_full | A semi-supervised Bayesian approach for simultaneous protein sub-cellular localisation assignment and novelty detection. |
title_fullStr | A semi-supervised Bayesian approach for simultaneous protein sub-cellular localisation assignment and novelty detection. |
title_full_unstemmed | A semi-supervised Bayesian approach for simultaneous protein sub-cellular localisation assignment and novelty detection. |
title_short | A semi-supervised Bayesian approach for simultaneous protein sub-cellular localisation assignment and novelty detection. |
title_sort | semi supervised bayesian approach for simultaneous protein sub cellular localisation assignment and novelty detection |
url | https://doi.org/10.1371/journal.pcbi.1008288 |
work_keys_str_mv | AT olivermcrook asemisupervisedbayesianapproachforsimultaneousproteinsubcellularlocalisationassignmentandnoveltydetection AT aikaterinigeladaki asemisupervisedbayesianapproachforsimultaneousproteinsubcellularlocalisationassignmentandnoveltydetection AT danieljhnightingale asemisupervisedbayesianapproachforsimultaneousproteinsubcellularlocalisationassignmentandnoveltydetection AT owenlvennard asemisupervisedbayesianapproachforsimultaneousproteinsubcellularlocalisationassignmentandnoveltydetection AT kathrynslilley asemisupervisedbayesianapproachforsimultaneousproteinsubcellularlocalisationassignmentandnoveltydetection AT laurentgatto asemisupervisedbayesianapproachforsimultaneousproteinsubcellularlocalisationassignmentandnoveltydetection AT pauldwkirk asemisupervisedbayesianapproachforsimultaneousproteinsubcellularlocalisationassignmentandnoveltydetection AT olivermcrook semisupervisedbayesianapproachforsimultaneousproteinsubcellularlocalisationassignmentandnoveltydetection AT aikaterinigeladaki semisupervisedbayesianapproachforsimultaneousproteinsubcellularlocalisationassignmentandnoveltydetection AT danieljhnightingale semisupervisedbayesianapproachforsimultaneousproteinsubcellularlocalisationassignmentandnoveltydetection AT owenlvennard semisupervisedbayesianapproachforsimultaneousproteinsubcellularlocalisationassignmentandnoveltydetection AT kathrynslilley semisupervisedbayesianapproachforsimultaneousproteinsubcellularlocalisationassignmentandnoveltydetection AT laurentgatto semisupervisedbayesianapproachforsimultaneousproteinsubcellularlocalisationassignmentandnoveltydetection AT pauldwkirk semisupervisedbayesianapproachforsimultaneousproteinsubcellularlocalisationassignmentandnoveltydetection |