Selecting a significance level in sequential testing procedures for community detection
Abstract While there have been numerous sequential algorithms developed to estimate community structure in networks, there is little available guidance and study of what significance level or stopping parameter to use in these sequential testing procedures. Most algorithms rely on prespecifiying the...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
SpringerOpen
2023-08-01
|
Series: | Applied Network Science |
Subjects: | |
Online Access: | https://doi.org/10.1007/s41109-023-00567-2 |
_version_ | 1797752863961645056 |
---|---|
author | Riddhi Pratim Ghosh Ian Barnett |
author_facet | Riddhi Pratim Ghosh Ian Barnett |
author_sort | Riddhi Pratim Ghosh |
collection | DOAJ |
description | Abstract While there have been numerous sequential algorithms developed to estimate community structure in networks, there is little available guidance and study of what significance level or stopping parameter to use in these sequential testing procedures. Most algorithms rely on prespecifiying the number of communities or use an arbitrary stopping rule. We provide a principled approach to selecting a nominal significance level for sequential community detection procedures by controlling the tolerance ratio, defined as the ratio of underfitting and overfitting probability of estimating the number of clusters in fitting a network. We introduce an algorithm for specifying this significance level from a user-specified tolerance ratio, and demonstrate its utility with a sequential modularity maximization approach in a stochastic block model framework. We evaluate the performance of the proposed algorithm through extensive simulations and demonstrate its utility in controlling the tolerance ratio in single-cell RNA sequencing clustering by cell type and by clustering a congressional voting network. |
first_indexed | 2024-03-12T17:09:38Z |
format | Article |
id | doaj.art-cb0ae2989cf14d088dfe947a9ec53cc3 |
institution | Directory Open Access Journal |
issn | 2364-8228 |
language | English |
last_indexed | 2024-03-12T17:09:38Z |
publishDate | 2023-08-01 |
publisher | SpringerOpen |
record_format | Article |
series | Applied Network Science |
spelling | doaj.art-cb0ae2989cf14d088dfe947a9ec53cc32023-08-06T11:09:41ZengSpringerOpenApplied Network Science2364-82282023-08-018111310.1007/s41109-023-00567-2Selecting a significance level in sequential testing procedures for community detectionRiddhi Pratim Ghosh0Ian Barnett1Department of Mathematics and Statistics, Bowling Green State UniversityDepartment of Biostatistics, University of PennsylvaniaAbstract While there have been numerous sequential algorithms developed to estimate community structure in networks, there is little available guidance and study of what significance level or stopping parameter to use in these sequential testing procedures. Most algorithms rely on prespecifiying the number of communities or use an arbitrary stopping rule. We provide a principled approach to selecting a nominal significance level for sequential community detection procedures by controlling the tolerance ratio, defined as the ratio of underfitting and overfitting probability of estimating the number of clusters in fitting a network. We introduce an algorithm for specifying this significance level from a user-specified tolerance ratio, and demonstrate its utility with a sequential modularity maximization approach in a stochastic block model framework. We evaluate the performance of the proposed algorithm through extensive simulations and demonstrate its utility in controlling the tolerance ratio in single-cell RNA sequencing clustering by cell type and by clustering a congressional voting network.https://doi.org/10.1007/s41109-023-00567-2Community detectionMultiple testingSequential testingStochastic block modelSingle cell RNA sequencing |
spellingShingle | Riddhi Pratim Ghosh Ian Barnett Selecting a significance level in sequential testing procedures for community detection Applied Network Science Community detection Multiple testing Sequential testing Stochastic block model Single cell RNA sequencing |
title | Selecting a significance level in sequential testing procedures for community detection |
title_full | Selecting a significance level in sequential testing procedures for community detection |
title_fullStr | Selecting a significance level in sequential testing procedures for community detection |
title_full_unstemmed | Selecting a significance level in sequential testing procedures for community detection |
title_short | Selecting a significance level in sequential testing procedures for community detection |
title_sort | selecting a significance level in sequential testing procedures for community detection |
topic | Community detection Multiple testing Sequential testing Stochastic block model Single cell RNA sequencing |
url | https://doi.org/10.1007/s41109-023-00567-2 |
work_keys_str_mv | AT riddhipratimghosh selectingasignificancelevelinsequentialtestingproceduresforcommunitydetection AT ianbarnett selectingasignificancelevelinsequentialtestingproceduresforcommunitydetection |