On the Q statistic with constant weights in meta-analysis of binary outcomes

Abstract Background Cochran’s Q statistic is routinely used for testing heterogeneity in meta-analysis. Its expected value (under an incorrect null distribution) is part of several popular estimators of the between-study variance, $$\tau ^2$$ τ 2 . Those applications generally do not account for use...

Full description

Bibliographic Details
Main Authors: Elena Kulinskaya, David C. Hoaglin
Format: Article
Language:English
Published: BMC 2023-06-01
Series:BMC Medical Research Methodology
Subjects:
Online Access:https://doi.org/10.1186/s12874-023-01939-z
Description
Summary:Abstract Background Cochran’s Q statistic is routinely used for testing heterogeneity in meta-analysis. Its expected value (under an incorrect null distribution) is part of several popular estimators of the between-study variance, $$\tau ^2$$ τ 2 . Those applications generally do not account for use of the studies’ estimated variances in the inverse-variance weights that define Q (more explicitly, $$Q_{IV}$$ Q IV ). Importantly, those weights make approximating the distribution of $$Q_{IV}$$ Q IV rather complicated. Methods As an alternative, we are investigating a Q statistic, $$Q_F$$ Q F , whose constant weights use only the studies’ arm-level sample sizes. For log-odds-ratio (LOR), log-relative-risk (LRR), and risk difference (RD) as the measures of effect, we study, by simulation, approximations to distributions of $$Q_{IV}$$ Q IV and $$Q_F$$ Q F , as the basis for tests of heterogeneity. Results The results show that: for LOR and LRR, a two-moment gamma approximation to the distribution of $$Q_F$$ Q F works well for small sample sizes, and an approximation based on an algorithm of Farebrother is recommended for larger sample sizes. For RD, the Farebrother approximation works very well, even for small sample sizes. For $$Q_{IV}$$ Q IV , the standard chi-square approximation provides levels that are much too low for LOR and LRR and too high for RD. The Kulinskaya et al. (Res Synth Methods 2:254–70, 2011) approximation for RD and the Kulinskaya and Dollinger (BMC Med Res Methodol 15:49, 2015) approximation for LOR work well for $$n \ge 100$$ n ≥ 100 but have some convergence issues for very small sample sizes combined with small probabilities. Conclusions The performance of the standard $$\chi ^2$$ χ 2 approximation is inadequate for all three binary effect measures. Instead, we recommend a test of heterogeneity based on $$Q_F$$ Q F and provide practical guidelines for choosing an appropriate test at the .05 level for all three effect measures.
ISSN:1471-2288