Correcting a nonparametric two-sample graph hypothesis test for graphs with different numbers of vertices with applications to connectomics
Abstract Random graphs are statistical models that have many applications, ranging from neuroscience to social network analysis. Of particular interest in some applications is the problem of testing two random graphs for equality of generating distributions. Tang et al. (Bernoulli 23:1599–1630, 2017...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
SpringerOpen
2024-01-01
|
Series: | Applied Network Science |
Subjects: | |
Online Access: | https://doi.org/10.1007/s41109-023-00607-x |
_version_ | 1797363483042381824 |
---|---|
author | Anton A. Alyakin Joshua Agterberg Hayden S. Helm Carey E. Priebe |
author_facet | Anton A. Alyakin Joshua Agterberg Hayden S. Helm Carey E. Priebe |
author_sort | Anton A. Alyakin |
collection | DOAJ |
description | Abstract Random graphs are statistical models that have many applications, ranging from neuroscience to social network analysis. Of particular interest in some applications is the problem of testing two random graphs for equality of generating distributions. Tang et al. (Bernoulli 23:1599–1630, 2017) propose a test for this setting. This test consists of embedding the graph into a low-dimensional space via the adjacency spectral embedding (ASE) and subsequently using a kernel two-sample test based on the maximum mean discrepancy. However, if the two graphs being compared have an unequal number of vertices, the test of Tang et al. (Bernoulli 23:1599–1630, 2017) may not be valid. We demonstrate the intuition behind this invalidity and propose a correction that makes any subsequent kernel- or distance-based test valid. Our method relies on sampling based on the asymptotic distribution for the ASE. We call these altered embeddings the corrected adjacency spectral embeddings (CASE). We also show that CASE remedies the exchangeability problem of the original test and demonstrate the validity and consistency of the test that uses CASE via a simulation study. Lastly, we apply our proposed test to the problem of determining equivalence of generating distributions in human connectomes extracted from diffusion magnetic resonance imaging at different scales. |
first_indexed | 2024-03-08T16:22:00Z |
format | Article |
id | doaj.art-0a69544ee5414a8ba969e5128e32445f |
institution | Directory Open Access Journal |
issn | 2364-8228 |
language | English |
last_indexed | 2024-03-08T16:22:00Z |
publishDate | 2024-01-01 |
publisher | SpringerOpen |
record_format | Article |
series | Applied Network Science |
spelling | doaj.art-0a69544ee5414a8ba969e5128e32445f2024-01-07T12:15:58ZengSpringerOpenApplied Network Science2364-82282024-01-019112610.1007/s41109-023-00607-xCorrecting a nonparametric two-sample graph hypothesis test for graphs with different numbers of vertices with applications to connectomicsAnton A. Alyakin0Joshua Agterberg1Hayden S. Helm2Carey E. Priebe3Department of Applied Mathematics and Statistics, Johns Hopkins UniversityDepartment of Applied Mathematics and Statistics, Johns Hopkins UniversityDepartment of Applied Mathematics and Statistics, Johns Hopkins UniversityDepartment of Applied Mathematics and Statistics, Johns Hopkins UniversityAbstract Random graphs are statistical models that have many applications, ranging from neuroscience to social network analysis. Of particular interest in some applications is the problem of testing two random graphs for equality of generating distributions. Tang et al. (Bernoulli 23:1599–1630, 2017) propose a test for this setting. This test consists of embedding the graph into a low-dimensional space via the adjacency spectral embedding (ASE) and subsequently using a kernel two-sample test based on the maximum mean discrepancy. However, if the two graphs being compared have an unequal number of vertices, the test of Tang et al. (Bernoulli 23:1599–1630, 2017) may not be valid. We demonstrate the intuition behind this invalidity and propose a correction that makes any subsequent kernel- or distance-based test valid. Our method relies on sampling based on the asymptotic distribution for the ASE. We call these altered embeddings the corrected adjacency spectral embeddings (CASE). We also show that CASE remedies the exchangeability problem of the original test and demonstrate the validity and consistency of the test that uses CASE via a simulation study. Lastly, we apply our proposed test to the problem of determining equivalence of generating distributions in human connectomes extracted from diffusion magnetic resonance imaging at different scales.https://doi.org/10.1007/s41109-023-00607-xAdjacency spectral embeddingLatent position graphRandom dot product graph |
spellingShingle | Anton A. Alyakin Joshua Agterberg Hayden S. Helm Carey E. Priebe Correcting a nonparametric two-sample graph hypothesis test for graphs with different numbers of vertices with applications to connectomics Applied Network Science Adjacency spectral embedding Latent position graph Random dot product graph |
title | Correcting a nonparametric two-sample graph hypothesis test for graphs with different numbers of vertices with applications to connectomics |
title_full | Correcting a nonparametric two-sample graph hypothesis test for graphs with different numbers of vertices with applications to connectomics |
title_fullStr | Correcting a nonparametric two-sample graph hypothesis test for graphs with different numbers of vertices with applications to connectomics |
title_full_unstemmed | Correcting a nonparametric two-sample graph hypothesis test for graphs with different numbers of vertices with applications to connectomics |
title_short | Correcting a nonparametric two-sample graph hypothesis test for graphs with different numbers of vertices with applications to connectomics |
title_sort | correcting a nonparametric two sample graph hypothesis test for graphs with different numbers of vertices with applications to connectomics |
topic | Adjacency spectral embedding Latent position graph Random dot product graph |
url | https://doi.org/10.1007/s41109-023-00607-x |
work_keys_str_mv | AT antonaalyakin correctinganonparametrictwosamplegraphhypothesistestforgraphswithdifferentnumbersofverticeswithapplicationstoconnectomics AT joshuaagterberg correctinganonparametrictwosamplegraphhypothesistestforgraphswithdifferentnumbersofverticeswithapplicationstoconnectomics AT haydenshelm correctinganonparametrictwosamplegraphhypothesistestforgraphswithdifferentnumbersofverticeswithapplicationstoconnectomics AT careyepriebe correctinganonparametrictwosamplegraphhypothesistestforgraphswithdifferentnumbersofverticeswithapplicationstoconnectomics |