Causal Inference Is Not Just a Statistics Problem

AbstractThis article introduces a collection of four datasets, similar to Anscombe’s quartet, that aim to highlight the challenges involved when estimating causal effects. Each of the four datasets is generated based on a distinct causal mechanism: the first involves a collider, the second involves...

Full description

Bibliographic Details
Main Authors: Lucy D’Agostino McGowan, Travis Gerke, Malcolm Barrett
Format: Article
Language:English
Published: Taylor & Francis Group 2024-05-01
Series:Journal of Statistics and Data Science Education
Subjects:
Online Access:https://www.tandfonline.com/doi/10.1080/26939169.2023.2276446
Description
Summary:AbstractThis article introduces a collection of four datasets, similar to Anscombe’s quartet, that aim to highlight the challenges involved when estimating causal effects. Each of the four datasets is generated based on a distinct causal mechanism: the first involves a collider, the second involves a confounder, the third involves a mediator, and the fourth involves the induction of M-Bias by an included factor. The article includes a mathematical summary of each dataset, as well as directed acyclic graphs that depict the relationships between the variables. Despite the fact that the statistical summaries and visualizations for each dataset are identical, the true causal effect differs, and estimating it correctly requires knowledge of the data-generating mechanism. These example datasets can help practitioners gain a better understanding of the assumptions underlying causal inference methods and emphasize the importance of gathering more information beyond what can be obtained from statistical tools alone. The article also includes R code for reproducing all figures and provides access to the datasets themselves through an R package named “quartets.” Supplementary materials for this article are available online.
ISSN:2693-9169