Selecting a significance level in sequential testing procedures for community detection

Abstract While there have been numerous sequential algorithms developed to estimate community structure in networks, there is little available guidance and study of what significance level or stopping parameter to use in these sequential testing procedures. Most algorithms rely on prespecifiying the...

Full description

Bibliographic Details
Main Authors: Riddhi Pratim Ghosh, Ian Barnett
Format: Article
Language:English
Published: SpringerOpen 2023-08-01
Series:Applied Network Science
Subjects:
Online Access:https://doi.org/10.1007/s41109-023-00567-2
_version_ 1797752863961645056
author Riddhi Pratim Ghosh
Ian Barnett
author_facet Riddhi Pratim Ghosh
Ian Barnett
author_sort Riddhi Pratim Ghosh
collection DOAJ
description Abstract While there have been numerous sequential algorithms developed to estimate community structure in networks, there is little available guidance and study of what significance level or stopping parameter to use in these sequential testing procedures. Most algorithms rely on prespecifiying the number of communities or use an arbitrary stopping rule. We provide a principled approach to selecting a nominal significance level for sequential community detection procedures by controlling the tolerance ratio, defined as the ratio of underfitting and overfitting probability of estimating the number of clusters in fitting a network. We introduce an algorithm for specifying this significance level from a user-specified tolerance ratio, and demonstrate its utility with a sequential modularity maximization approach in a stochastic block model framework. We evaluate the performance of the proposed algorithm through extensive simulations and demonstrate its utility in controlling the tolerance ratio in single-cell RNA sequencing clustering by cell type and by clustering a congressional voting network.
first_indexed 2024-03-12T17:09:38Z
format Article
id doaj.art-cb0ae2989cf14d088dfe947a9ec53cc3
institution Directory Open Access Journal
issn 2364-8228
language English
last_indexed 2024-03-12T17:09:38Z
publishDate 2023-08-01
publisher SpringerOpen
record_format Article
series Applied Network Science
spelling doaj.art-cb0ae2989cf14d088dfe947a9ec53cc32023-08-06T11:09:41ZengSpringerOpenApplied Network Science2364-82282023-08-018111310.1007/s41109-023-00567-2Selecting a significance level in sequential testing procedures for community detectionRiddhi Pratim Ghosh0Ian Barnett1Department of Mathematics and Statistics, Bowling Green State UniversityDepartment of Biostatistics, University of PennsylvaniaAbstract While there have been numerous sequential algorithms developed to estimate community structure in networks, there is little available guidance and study of what significance level or stopping parameter to use in these sequential testing procedures. Most algorithms rely on prespecifiying the number of communities or use an arbitrary stopping rule. We provide a principled approach to selecting a nominal significance level for sequential community detection procedures by controlling the tolerance ratio, defined as the ratio of underfitting and overfitting probability of estimating the number of clusters in fitting a network. We introduce an algorithm for specifying this significance level from a user-specified tolerance ratio, and demonstrate its utility with a sequential modularity maximization approach in a stochastic block model framework. We evaluate the performance of the proposed algorithm through extensive simulations and demonstrate its utility in controlling the tolerance ratio in single-cell RNA sequencing clustering by cell type and by clustering a congressional voting network.https://doi.org/10.1007/s41109-023-00567-2Community detectionMultiple testingSequential testingStochastic block modelSingle cell RNA sequencing
spellingShingle Riddhi Pratim Ghosh
Ian Barnett
Selecting a significance level in sequential testing procedures for community detection
Applied Network Science
Community detection
Multiple testing
Sequential testing
Stochastic block model
Single cell RNA sequencing
title Selecting a significance level in sequential testing procedures for community detection
title_full Selecting a significance level in sequential testing procedures for community detection
title_fullStr Selecting a significance level in sequential testing procedures for community detection
title_full_unstemmed Selecting a significance level in sequential testing procedures for community detection
title_short Selecting a significance level in sequential testing procedures for community detection
title_sort selecting a significance level in sequential testing procedures for community detection
topic Community detection
Multiple testing
Sequential testing
Stochastic block model
Single cell RNA sequencing
url https://doi.org/10.1007/s41109-023-00567-2
work_keys_str_mv AT riddhipratimghosh selectingasignificancelevelinsequentialtestingproceduresforcommunitydetection
AT ianbarnett selectingasignificancelevelinsequentialtestingproceduresforcommunitydetection