Confidence in protein interaction networks

<p>Protein interaction networks are a commonly used tool in bioinformatics, e.g. for the purposes of gene function prediction or drug target identification. They are built from often heterogeneous and error-prone protein-protein interaction data. In this thesis we study the effects of data unc...

Full description

Bibliographic Details
Main Author: Bozhilova, LV
Other Authors: Deane, C
Format: Thesis
Language:English
Published: 2019
Subjects:
Description
Summary:<p>Protein interaction networks are a commonly used tool in bioinformatics, e.g. for the purposes of gene function prediction or drug target identification. They are built from often heterogeneous and error-prone protein-protein interaction data. In this thesis we study the effects of data uncertainty on the structure of protein interaction networks and on downstream network analysis.</p> <p>Some databases provide confidence scores for protein-protein interactions, and networks are built from the data after a minimum score cut-off, or threshold, is applied. We study the effects of threshold choice on network structure. We argue that robust, biologically-relevant network analysis results should be replicated across networks obtained at different thresholds, and develop a methodology for quantifying this robustness in the context of node metrics. Our results indicate that the same node metrics are robust across a range of protein interaction networks, but are not necessarily robust in synthetic networks.</p> <p>We further investigate uncertain networks as a possible approach to incorporating confidence scores explicitly into network analysis. Uncertain networks are a way of conceptualising the difference between the "true" network of biologically-relevant protein-protein interactions and the observed scored data. We show that any inference on the structure of the "true" network is strongly influenced by assumptions made about the dependence - or lack thereof - between edges in the scored network.</p> <p>Finally, we focus on networks constructed from gene co-expression data. Gene co-expression can be measured in a number of different ways. Moreover, when networks are constructed, different thresholds can be applied to the co-expression values. It is not always clear which network construction method should be preferred. We develop a software package, COGENT, designed to aid network construction choice without the need for external validation data.</p>