Generalized mean p-values for combining dependent tests: Comparison of generalized central limit theorem and robust risk analysis

The test statistics underpinning several methods for combining p-values are special cases of generalized mean p-value (GMP), including the minimum (Bonferroni procedure), harmonic mean and geometric mean. A key assumption influencing the practical performance of such methods concerns the dependence...

Full description

Bibliographic Details
Main Author: Wilson, D
Format: Journal article
Language:English
Published: F1000Research 2020
_version_ 1797104541487857664
author Wilson, D
author_facet Wilson, D
author_sort Wilson, D
collection OXFORD
description The test statistics underpinning several methods for combining p-values are special cases of generalized mean p-value (GMP), including the minimum (Bonferroni procedure), harmonic mean and geometric mean. A key assumption influencing the practical performance of such methods concerns the dependence between p-values. Approaches that do not require specific knowledge of the dependence structure are practically convenient. Vovk and Wang derived significance thresholds for GMPs under the worst-case scenario of arbitrary dependence using results from Robust Risk Analysis (RRA). Here I calculate significance thresholds and closed testing procedures using Generalized Central Limit Theorem (GCLT). GCLT formally assumes independence, but enjoys a degree of robustness to dependence. The GCLT thresholds are less stringent than RRA thresholds, with the disparity increasing as the exponent of the GMP (r) increases. I motivate a model of p-value dependence based on a Wishart-Multivariate-Gamma distribution for the underlying log-likelihood ratios. In simulations under this model, the RRA thresholds produced tests that were usually less powerful than Bonferroni, while the GCLT thresholds produced tests more powerful than Bonferroni, for all r> − ∞. Above r> − 1, the GCLT thresholds suffered pronounced false positive rates. Above r> − 1/2, standard central limit theorem applied and the GCLT thresholds no longer possessed any useful robustness to dependence. I consider the implications of these results in the context of various interpretations of GMPs, and conclude that the GCLT-based harmonic mean p-value procedure and Simes' (1986) test represent good compromises in power-robustness trade-off for combining dependent tests.
first_indexed 2024-03-07T06:35:13Z
format Journal article
id oxford-uuid:f7639d68-80e3-4a03-ba04-aac05199ea9a
institution University of Oxford
language English
last_indexed 2024-03-07T06:35:13Z
publishDate 2020
publisher F1000Research
record_format dspace
spelling oxford-uuid:f7639d68-80e3-4a03-ba04-aac05199ea9a2022-03-27T12:42:16ZGeneralized mean p-values for combining dependent tests: Comparison of generalized central limit theorem and robust risk analysisJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:f7639d68-80e3-4a03-ba04-aac05199ea9aEnglishSymplectic ElementsF1000Research2020Wilson, DThe test statistics underpinning several methods for combining p-values are special cases of generalized mean p-value (GMP), including the minimum (Bonferroni procedure), harmonic mean and geometric mean. A key assumption influencing the practical performance of such methods concerns the dependence between p-values. Approaches that do not require specific knowledge of the dependence structure are practically convenient. Vovk and Wang derived significance thresholds for GMPs under the worst-case scenario of arbitrary dependence using results from Robust Risk Analysis (RRA). Here I calculate significance thresholds and closed testing procedures using Generalized Central Limit Theorem (GCLT). GCLT formally assumes independence, but enjoys a degree of robustness to dependence. The GCLT thresholds are less stringent than RRA thresholds, with the disparity increasing as the exponent of the GMP (r) increases. I motivate a model of p-value dependence based on a Wishart-Multivariate-Gamma distribution for the underlying log-likelihood ratios. In simulations under this model, the RRA thresholds produced tests that were usually less powerful than Bonferroni, while the GCLT thresholds produced tests more powerful than Bonferroni, for all r> − ∞. Above r> − 1, the GCLT thresholds suffered pronounced false positive rates. Above r> − 1/2, standard central limit theorem applied and the GCLT thresholds no longer possessed any useful robustness to dependence. I consider the implications of these results in the context of various interpretations of GMPs, and conclude that the GCLT-based harmonic mean p-value procedure and Simes' (1986) test represent good compromises in power-robustness trade-off for combining dependent tests.
spellingShingle Wilson, D
Generalized mean p-values for combining dependent tests: Comparison of generalized central limit theorem and robust risk analysis
title Generalized mean p-values for combining dependent tests: Comparison of generalized central limit theorem and robust risk analysis
title_full Generalized mean p-values for combining dependent tests: Comparison of generalized central limit theorem and robust risk analysis
title_fullStr Generalized mean p-values for combining dependent tests: Comparison of generalized central limit theorem and robust risk analysis
title_full_unstemmed Generalized mean p-values for combining dependent tests: Comparison of generalized central limit theorem and robust risk analysis
title_short Generalized mean p-values for combining dependent tests: Comparison of generalized central limit theorem and robust risk analysis
title_sort generalized mean p values for combining dependent tests comparison of generalized central limit theorem and robust risk analysis
work_keys_str_mv AT wilsond generalizedmeanpvaluesforcombiningdependenttestscomparisonofgeneralizedcentrallimittheoremandrobustriskanalysis