Modelling and comparing protein interaction networks using subgraph counts

<p>The astonishing progress of molecular biology, engineering and computer science has resulted in mature technologies capable of examining multiple cellular components at a genome-wide scale. Protein-protein interactions are one example of such growing data. These data are often organised as...

Full description

Bibliographic Details
Main Author: Chegancas Rito, TM
Other Authors: Deane, C
Format: Thesis
Language:English
Published: 2012
Subjects:
_version_ 1797112960629342208
author Chegancas Rito, TM
author2 Deane, C
author_facet Deane, C
Chegancas Rito, TM
author_sort Chegancas Rito, TM
collection OXFORD
description <p>The astonishing progress of molecular biology, engineering and computer science has resulted in mature technologies capable of examining multiple cellular components at a genome-wide scale. Protein-protein interactions are one example of such growing data. These data are often organised as networks with proteins as nodes and interactions as edges. Albeit still incomplete, there is now a substantial amount of data available and there is a need for biologically meaningful methods to analyse and interpret these interactions.</p> <p>In this thesis we focus on how to compare protein interaction networks (PINs) and on the rela- tionship between network architecture and the biological characteristics of proteins. The underlying theme throughout the dissertation is the use of small subgraphs – small interaction patterns between 2-5 proteins.</p> <p>We start by examining two popular scores that are used to compare PINs and network models. When comparing networks of the same model type we find that the typical scores are highly unstable and depend on the number of nodes and edges in the networks. This is unsatisfactory and we propose a method based on non-parametric statistics to make more meaningful comparisons. We also employ principal component analysis to judge model fit according to subgraph counts. From these analyses we show that no current model fits to the PINs; this may well reflect our lack of knowledge on the evolution of protein interactions. Thus, we use explanatory variables such as protein age and protein structural class to find patterns in the interactions and subgraphs we observe. We discover that the yeast PIN is highly heterogeneous and therefore no single model is likely to fit the network. Instead, we focus on ego-networks containing an initial protein plus its interacting partners and their interaction partners. In the final chapter we propose a new, alignment-free method for network comparison based on such ego-networks. The method compares subgraph counts in neighbourhoods within PINs in an averaging, many-to-many fashion. It clusters networks of the same model type and is able to successfully reconstruct species phylogenies solely based on PIN data providing exciting new directions for future research.</p>
first_indexed 2024-03-07T05:14:36Z
format Thesis
id oxford-uuid:dcc0eb0d-1dd8-428d-b2ec-447a806d6aa8
institution University of Oxford
language English
last_indexed 2024-04-09T03:56:24Z
publishDate 2012
record_format dspace
spelling oxford-uuid:dcc0eb0d-1dd8-428d-b2ec-447a806d6aa82024-03-15T12:32:53ZModelling and comparing protein interaction networks using subgraph countsThesishttp://purl.org/coar/resource_type/c_db06uuid:dcc0eb0d-1dd8-428d-b2ec-447a806d6aa8Bioinformatics (biochemistry)Systems BiologyComputational biochemistryBiology and other natural sciences (mathematics)BiologyBioinformatics (life sciences)BiochemistryStatisticsEnglishOxford University Research Archive - Valet2012Chegancas Rito, TMDeane, CReinert, G<p>The astonishing progress of molecular biology, engineering and computer science has resulted in mature technologies capable of examining multiple cellular components at a genome-wide scale. Protein-protein interactions are one example of such growing data. These data are often organised as networks with proteins as nodes and interactions as edges. Albeit still incomplete, there is now a substantial amount of data available and there is a need for biologically meaningful methods to analyse and interpret these interactions.</p> <p>In this thesis we focus on how to compare protein interaction networks (PINs) and on the rela- tionship between network architecture and the biological characteristics of proteins. The underlying theme throughout the dissertation is the use of small subgraphs – small interaction patterns between 2-5 proteins.</p> <p>We start by examining two popular scores that are used to compare PINs and network models. When comparing networks of the same model type we find that the typical scores are highly unstable and depend on the number of nodes and edges in the networks. This is unsatisfactory and we propose a method based on non-parametric statistics to make more meaningful comparisons. We also employ principal component analysis to judge model fit according to subgraph counts. From these analyses we show that no current model fits to the PINs; this may well reflect our lack of knowledge on the evolution of protein interactions. Thus, we use explanatory variables such as protein age and protein structural class to find patterns in the interactions and subgraphs we observe. We discover that the yeast PIN is highly heterogeneous and therefore no single model is likely to fit the network. Instead, we focus on ego-networks containing an initial protein plus its interacting partners and their interaction partners. In the final chapter we propose a new, alignment-free method for network comparison based on such ego-networks. The method compares subgraph counts in neighbourhoods within PINs in an averaging, many-to-many fashion. It clusters networks of the same model type and is able to successfully reconstruct species phylogenies solely based on PIN data providing exciting new directions for future research.</p>
spellingShingle Bioinformatics (biochemistry)
Systems Biology
Computational biochemistry
Biology and other natural sciences (mathematics)
Biology
Bioinformatics (life sciences)
Biochemistry
Statistics
Chegancas Rito, TM
Modelling and comparing protein interaction networks using subgraph counts
title Modelling and comparing protein interaction networks using subgraph counts
title_full Modelling and comparing protein interaction networks using subgraph counts
title_fullStr Modelling and comparing protein interaction networks using subgraph counts
title_full_unstemmed Modelling and comparing protein interaction networks using subgraph counts
title_short Modelling and comparing protein interaction networks using subgraph counts
title_sort modelling and comparing protein interaction networks using subgraph counts
topic Bioinformatics (biochemistry)
Systems Biology
Computational biochemistry
Biology and other natural sciences (mathematics)
Biology
Bioinformatics (life sciences)
Biochemistry
Statistics
work_keys_str_mv AT chegancasritotm modellingandcomparingproteininteractionnetworksusingsubgraphcounts