Consequences of relying on statistical significance: some illustrations

<strong>Background</strong> Despite regular criticisms of null hypothesis significance testing (NHST), a focus on testing persists, sometimes in the belief to get published and sometimes encouraged by journal reviewers. This paper aims to demonstrate known key limitations of NHST using s...

Full description

Bibliographic Details
Main Authors: Van Calster, B, Steyerberg, E, Collins, G, Smits, T
Format: Journal article
Language:English
Published: Wiley 2018
_version_ 1797083611491467264
author Van Calster, B
Steyerberg, E
Collins, G
Smits, T
author_facet Van Calster, B
Steyerberg, E
Collins, G
Smits, T
author_sort Van Calster, B
collection OXFORD
description <strong>Background</strong> Despite regular criticisms of null hypothesis significance testing (NHST), a focus on testing persists, sometimes in the belief to get published and sometimes encouraged by journal reviewers. This paper aims to demonstrate known key limitations of NHST using simple nontechnical illustrations. <strong>Design</strong> The first illustration is based on simulated data of 20,000 studies that compare two groups for an outcome event. The true effect size (difference in event rates) and sample size (20 to 100 per group) were varied. The second illustration used real data from a meta-analysis on alpha blockers for the treatment of ureteric stones. <strong>Results</strong> The simulations demonstrated the large between-study variability of p-values (range between &lt;0.0001 and 1 for most simulation conditions). A focus on statistically significant effects (p&lt;0.05), notably in small to moderate samples, led to strongly overestimated effect sizes (up to 240%) and many false positive conclusions, i.e. statistically significant effects that were in fact true null effects. Effect sizes also exerted strong between-study variability, but confidence intervals accounted for this: the interval width decreased with larger sample size, and the percentage of intervals that contained the true effect size was accurate across simulation conditions. Reducing alpha level, as recently suggested, reduced false positive conclusions but strongly increased the overestimation of significant effects (up to 320%). <strong>Conclusions</strong> Researchers and journals should abandon statistical significance as a pivotal element in most scientific publications. Confidence intervals around effect sizes are more informative, but should not merely be reported to comply with journal requirements.
first_indexed 2024-03-07T01:43:51Z
format Journal article
id oxford-uuid:97bdf8a3-2144-4b68-a107-929620722732
institution University of Oxford
language English
last_indexed 2024-03-07T01:43:51Z
publishDate 2018
publisher Wiley
record_format dspace
spelling oxford-uuid:97bdf8a3-2144-4b68-a107-9296207227322022-03-27T00:02:07ZConsequences of relying on statistical significance: some illustrationsJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:97bdf8a3-2144-4b68-a107-929620722732EnglishSymplectic Elements at OxfordWiley2018Van Calster, BSteyerberg, ECollins, GSmits, T<strong>Background</strong> Despite regular criticisms of null hypothesis significance testing (NHST), a focus on testing persists, sometimes in the belief to get published and sometimes encouraged by journal reviewers. This paper aims to demonstrate known key limitations of NHST using simple nontechnical illustrations. <strong>Design</strong> The first illustration is based on simulated data of 20,000 studies that compare two groups for an outcome event. The true effect size (difference in event rates) and sample size (20 to 100 per group) were varied. The second illustration used real data from a meta-analysis on alpha blockers for the treatment of ureteric stones. <strong>Results</strong> The simulations demonstrated the large between-study variability of p-values (range between &lt;0.0001 and 1 for most simulation conditions). A focus on statistically significant effects (p&lt;0.05), notably in small to moderate samples, led to strongly overestimated effect sizes (up to 240%) and many false positive conclusions, i.e. statistically significant effects that were in fact true null effects. Effect sizes also exerted strong between-study variability, but confidence intervals accounted for this: the interval width decreased with larger sample size, and the percentage of intervals that contained the true effect size was accurate across simulation conditions. Reducing alpha level, as recently suggested, reduced false positive conclusions but strongly increased the overestimation of significant effects (up to 320%). <strong>Conclusions</strong> Researchers and journals should abandon statistical significance as a pivotal element in most scientific publications. Confidence intervals around effect sizes are more informative, but should not merely be reported to comply with journal requirements.
spellingShingle Van Calster, B
Steyerberg, E
Collins, G
Smits, T
Consequences of relying on statistical significance: some illustrations
title Consequences of relying on statistical significance: some illustrations
title_full Consequences of relying on statistical significance: some illustrations
title_fullStr Consequences of relying on statistical significance: some illustrations
title_full_unstemmed Consequences of relying on statistical significance: some illustrations
title_short Consequences of relying on statistical significance: some illustrations
title_sort consequences of relying on statistical significance some illustrations
work_keys_str_mv AT vancalsterb consequencesofrelyingonstatisticalsignificancesomeillustrations
AT steyerberge consequencesofrelyingonstatisticalsignificancesomeillustrations
AT collinsg consequencesofrelyingonstatisticalsignificancesomeillustrations
AT smitst consequencesofrelyingonstatisticalsignificancesomeillustrations