Assessment of model fit via network comparison methods based on subgraph counts

While the number of network comparison methods is increasing, benchmarking of these methods is still in its infancy. The lack of understanding of complex dependencies among network characteristics makes it difficult to fully understand the meaning of the different network comparison methodologies an...

Full description

Bibliographic Details
Main Authors: Ospina-Forero, L, Deane, C, Reinert, G
Format: Journal article
Published: Oxford University Press 2018
Description
Summary:While the number of network comparison methods is increasing, benchmarking of these methods is still in its infancy. The lack of understanding of complex dependencies among network characteristics makes it difficult to fully understand the meaning of the different network comparison methodologies and the relations between them. In this article, we use a Monte Carlo framework as a way to address three general questions about the network comparison methods based on subgraph counts: (1) Can the methods differentiate between networks generated from different network generation mechanisms? (2) Are the number of nodes or average degree, confounding factors for the comparison of networks? (3) Do all methods reach the same conclusions? We further use the Monte Carlo framework to test the fit of ER, Chung-Lu and a duplication–divergence model to the protein–protein interaction (PPI) networks of Yeast, Fly, Worm, Human, Escherichia Coli, five herpes virus networks and five social networks. In contrast to previous claims in the literature, we show that the large PPI networks are not well modelled by the Chung-Lu model according to any of our tested methods. We find that network comparison statistics are not completely invariant to changes in the number of nodes and edges. Some methods focus on fine grain similarities, such as graphlet correlation distance, while other methods such as Netdis, can capture the similarities of networks despite them having different numbers of nodes and edges.