Correcting a nonparametric two-sample graph hypothesis test for graphs with different numbers of vertices with applications to connectomics

Abstract Random graphs are statistical models that have many applications, ranging from neuroscience to social network analysis. Of particular interest in some applications is the problem of testing two random graphs for equality of generating distributions. Tang et al. (Bernoulli 23:1599–1630, 2017...

Full description

Bibliographic Details
Main Authors: Anton A. Alyakin, Joshua Agterberg, Hayden S. Helm, Carey E. Priebe
Format: Article
Language:English
Published: SpringerOpen 2024-01-01
Series:Applied Network Science
Subjects:
Online Access:https://doi.org/10.1007/s41109-023-00607-x
_version_ 1797363483042381824
author Anton A. Alyakin
Joshua Agterberg
Hayden S. Helm
Carey E. Priebe
author_facet Anton A. Alyakin
Joshua Agterberg
Hayden S. Helm
Carey E. Priebe
author_sort Anton A. Alyakin
collection DOAJ
description Abstract Random graphs are statistical models that have many applications, ranging from neuroscience to social network analysis. Of particular interest in some applications is the problem of testing two random graphs for equality of generating distributions. Tang et al. (Bernoulli 23:1599–1630, 2017) propose a test for this setting. This test consists of embedding the graph into a low-dimensional space via the adjacency spectral embedding (ASE) and subsequently using a kernel two-sample test based on the maximum mean discrepancy. However, if the two graphs being compared have an unequal number of vertices, the test of Tang et al. (Bernoulli 23:1599–1630, 2017) may not be valid. We demonstrate the intuition behind this invalidity and propose a correction that makes any subsequent kernel- or distance-based test valid. Our method relies on sampling based on the asymptotic distribution for the ASE. We call these altered embeddings the corrected adjacency spectral embeddings (CASE). We also show that CASE remedies the exchangeability problem of the original test and demonstrate the validity and consistency of the test that uses CASE via a simulation study. Lastly, we apply our proposed test to the problem of determining equivalence of generating distributions in human connectomes extracted from diffusion magnetic resonance imaging at different scales.
first_indexed 2024-03-08T16:22:00Z
format Article
id doaj.art-0a69544ee5414a8ba969e5128e32445f
institution Directory Open Access Journal
issn 2364-8228
language English
last_indexed 2024-03-08T16:22:00Z
publishDate 2024-01-01
publisher SpringerOpen
record_format Article
series Applied Network Science
spelling doaj.art-0a69544ee5414a8ba969e5128e32445f2024-01-07T12:15:58ZengSpringerOpenApplied Network Science2364-82282024-01-019112610.1007/s41109-023-00607-xCorrecting a nonparametric two-sample graph hypothesis test for graphs with different numbers of vertices with applications to connectomicsAnton A. Alyakin0Joshua Agterberg1Hayden S. Helm2Carey E. Priebe3Department of Applied Mathematics and Statistics, Johns Hopkins UniversityDepartment of Applied Mathematics and Statistics, Johns Hopkins UniversityDepartment of Applied Mathematics and Statistics, Johns Hopkins UniversityDepartment of Applied Mathematics and Statistics, Johns Hopkins UniversityAbstract Random graphs are statistical models that have many applications, ranging from neuroscience to social network analysis. Of particular interest in some applications is the problem of testing two random graphs for equality of generating distributions. Tang et al. (Bernoulli 23:1599–1630, 2017) propose a test for this setting. This test consists of embedding the graph into a low-dimensional space via the adjacency spectral embedding (ASE) and subsequently using a kernel two-sample test based on the maximum mean discrepancy. However, if the two graphs being compared have an unequal number of vertices, the test of Tang et al. (Bernoulli 23:1599–1630, 2017) may not be valid. We demonstrate the intuition behind this invalidity and propose a correction that makes any subsequent kernel- or distance-based test valid. Our method relies on sampling based on the asymptotic distribution for the ASE. We call these altered embeddings the corrected adjacency spectral embeddings (CASE). We also show that CASE remedies the exchangeability problem of the original test and demonstrate the validity and consistency of the test that uses CASE via a simulation study. Lastly, we apply our proposed test to the problem of determining equivalence of generating distributions in human connectomes extracted from diffusion magnetic resonance imaging at different scales.https://doi.org/10.1007/s41109-023-00607-xAdjacency spectral embeddingLatent position graphRandom dot product graph
spellingShingle Anton A. Alyakin
Joshua Agterberg
Hayden S. Helm
Carey E. Priebe
Correcting a nonparametric two-sample graph hypothesis test for graphs with different numbers of vertices with applications to connectomics
Applied Network Science
Adjacency spectral embedding
Latent position graph
Random dot product graph
title Correcting a nonparametric two-sample graph hypothesis test for graphs with different numbers of vertices with applications to connectomics
title_full Correcting a nonparametric two-sample graph hypothesis test for graphs with different numbers of vertices with applications to connectomics
title_fullStr Correcting a nonparametric two-sample graph hypothesis test for graphs with different numbers of vertices with applications to connectomics
title_full_unstemmed Correcting a nonparametric two-sample graph hypothesis test for graphs with different numbers of vertices with applications to connectomics
title_short Correcting a nonparametric two-sample graph hypothesis test for graphs with different numbers of vertices with applications to connectomics
title_sort correcting a nonparametric two sample graph hypothesis test for graphs with different numbers of vertices with applications to connectomics
topic Adjacency spectral embedding
Latent position graph
Random dot product graph
url https://doi.org/10.1007/s41109-023-00607-x
work_keys_str_mv AT antonaalyakin correctinganonparametrictwosamplegraphhypothesistestforgraphswithdifferentnumbersofverticeswithapplicationstoconnectomics
AT joshuaagterberg correctinganonparametrictwosamplegraphhypothesistestforgraphswithdifferentnumbersofverticeswithapplicationstoconnectomics
AT haydenshelm correctinganonparametrictwosamplegraphhypothesistestforgraphswithdifferentnumbersofverticeswithapplicationstoconnectomics
AT careyepriebe correctinganonparametrictwosamplegraphhypothesistestforgraphswithdifferentnumbersofverticeswithapplicationstoconnectomics