Comparison of large networks with sub-sampling strategies

Networks are routinely used to represent large data sets, making the comparison of networks a tantalizing research question in many areas. Techniques for such analysis vary from simply comparing network summary statistics to sophisticated but computationally expensive alignment-based approaches. Mos...

Full description

Bibliographic Details
Main Authors:	Ali, W, Wegner, A, Gaunt, R, Deane, C, Reinert, G
Format:	Journal article
Language:	English
Published:	Nature Publishing Group 2016

_version_	1797060656652877824
author	Ali, W Wegner, A Gaunt, R Deane, C Reinert, G
author_facet	Ali, W Wegner, A Gaunt, R Deane, C Reinert, G
author_sort	Ali, W
collection	OXFORD
description	Networks are routinely used to represent large data sets, making the comparison of networks a tantalizing research question in many areas. Techniques for such analysis vary from simply comparing network summary statistics to sophisticated but computationally expensive alignment-based approaches. Most existing methods either do not generalize well to different types of networks or do not provide a quantitative similarity score between networks. In contrast, alignment-free topology based network similarity scores empower us to analyse large sets of networks containing different types and sizes of data. Netdis is such a score that defines network similarity through the counts of small sub-graphs in the local neighbourhood of all nodes. Here, we introduce a sub-sampling procedure based on neighbourhoods which links naturally with the framework of network comparisons through local neighbourhood comparisons. Our theoretical arguments justify basing the Netdis statistic on a sample of similar-sized neighbourhoods. Our tests on empirical and synthetic datasets indicate that often only 10% of the neighbourhoods of a network suffice for optimal performance, leading to a drastic reduction in computational requirements. The sampling procedure is applicable even when only a small sample of the network is known, and thus provides a novel tool for network comparison of very large and potentially incomplete datasets.
first_indexed	2024-03-06T20:20:11Z
format	Journal article
id	oxford-uuid:2d8992f0-165a-4237-9d5b-6b46bfa78d2e
institution	University of Oxford
language	English
last_indexed	2024-03-06T20:20:11Z
publishDate	2016
publisher	Nature Publishing Group
record_format	dspace
spelling	oxford-uuid:2d8992f0-165a-4237-9d5b-6b46bfa78d2e2022-03-26T12:43:33ZComparison of large networks with sub-sampling strategiesJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:2d8992f0-165a-4237-9d5b-6b46bfa78d2eEnglishSymplectic Elements at OxfordNature Publishing Group2016Ali, WWegner, AGaunt, RDeane, CReinert, GNetworks are routinely used to represent large data sets, making the comparison of networks a tantalizing research question in many areas. Techniques for such analysis vary from simply comparing network summary statistics to sophisticated but computationally expensive alignment-based approaches. Most existing methods either do not generalize well to different types of networks or do not provide a quantitative similarity score between networks. In contrast, alignment-free topology based network similarity scores empower us to analyse large sets of networks containing different types and sizes of data. Netdis is such a score that defines network similarity through the counts of small sub-graphs in the local neighbourhood of all nodes. Here, we introduce a sub-sampling procedure based on neighbourhoods which links naturally with the framework of network comparisons through local neighbourhood comparisons. Our theoretical arguments justify basing the Netdis statistic on a sample of similar-sized neighbourhoods. Our tests on empirical and synthetic datasets indicate that often only 10% of the neighbourhoods of a network suffice for optimal performance, leading to a drastic reduction in computational requirements. The sampling procedure is applicable even when only a small sample of the network is known, and thus provides a novel tool for network comparison of very large and potentially incomplete datasets.
spellingShingle	Ali, W Wegner, A Gaunt, R Deane, C Reinert, G Comparison of large networks with sub-sampling strategies
title	Comparison of large networks with sub-sampling strategies
title_full	Comparison of large networks with sub-sampling strategies
title_fullStr	Comparison of large networks with sub-sampling strategies
title_full_unstemmed	Comparison of large networks with sub-sampling strategies
title_short	Comparison of large networks with sub-sampling strategies
title_sort	comparison of large networks with sub sampling strategies
work_keys_str_mv	AT aliw comparisonoflargenetworkswithsubsamplingstrategies AT wegnera comparisonoflargenetworkswithsubsamplingstrategies AT gauntr comparisonoflargenetworkswithsubsamplingstrategies AT deanec comparisonoflargenetworkswithsubsamplingstrategies AT reinertg comparisonoflargenetworkswithsubsamplingstrategies

Comparison of large networks with sub-sampling strategies

Similar Items