A data-adaptive method for investigating effect heterogeneity with high-dimensional covariates in Mendelian randomization

Abstract Background Mendelian randomization is a popular method for causal inference with observational data that uses genetic variants as instrumental variables. Similarly to a randomized trial, a standard Mendelian randomization analysis estimates the population-averaged effect of an exposure on a...

Full description

Bibliographic Details
Main Authors:	Haodong Tian, Brian D. M. Tom, Stephen Burgess
Format:	Article
Language:	English
Published:	BMC 2024-02-01
Series:	BMC Medical Research Methodology
Subjects:	Genetics Instrumental variable Stratification Heterogenous effect Random forest Variable importance
Online Access:	https://doi.org/10.1186/s12874-024-02153-1

_version_	1827327079632338944
author	Haodong Tian Brian D. M. Tom Stephen Burgess
author_facet	Haodong Tian Brian D. M. Tom Stephen Burgess
author_sort	Haodong Tian
collection	DOAJ
description	Abstract Background Mendelian randomization is a popular method for causal inference with observational data that uses genetic variants as instrumental variables. Similarly to a randomized trial, a standard Mendelian randomization analysis estimates the population-averaged effect of an exposure on an outcome. Dividing the population into subgroups can reveal effect heterogeneity to inform who would most benefit from intervention on the exposure. However, as covariates are measured post-“randomization”, naive stratification typically induces collider bias in stratum-specific estimates. Method We extend a previously proposed stratification method (the “doubly-ranked method”) to form strata based on a single covariate, and introduce a data-adaptive random forest method to calculate stratum-specific estimates that are robust to collider bias based on a high-dimensional covariate set. We also propose measures based on the Q statistic to assess heterogeneity between stratum-specific estimates (to understand whether estimates are more variable than expected due to chance alone) and variable importance (to identify the key drivers of effect heterogeneity). Result We show that the effect of body mass index (BMI) on lung function is heterogeneous, depending most strongly on hip circumference and weight. While for most individuals, the predicted effect of increasing BMI on lung function is negative, it is positive for some individuals and strongly negative for others. Conclusion Our data-adaptive approach allows for the exploration of effect heterogeneity in the relationship between an exposure and an outcome within a Mendelian randomization framework. This can yield valuable insights into disease aetiology and help identify specific groups of individuals who would derive the greatest benefit from targeted interventions on the exposure.
first_indexed	2024-03-07T14:55:50Z
format	Article
id	doaj.art-c9f473150dd5491baf5954549df3ff7b
institution	Directory Open Access Journal
issn	1471-2288
language	English
last_indexed	2024-03-07T14:55:50Z
publishDate	2024-02-01
publisher	BMC
record_format	Article
series	BMC Medical Research Methodology
spelling	doaj.art-c9f473150dd5491baf5954549df3ff7b2024-03-05T19:28:33ZengBMCBMC Medical Research Methodology1471-22882024-02-0124111610.1186/s12874-024-02153-1A data-adaptive method for investigating effect heterogeneity with high-dimensional covariates in Mendelian randomizationHaodong Tian0Brian D. M. Tom1Stephen Burgess2MRC Biostatistics Unit, School of Clinical Medicine, University of CambridgeMRC Biostatistics Unit, School of Clinical Medicine, University of CambridgeMRC Biostatistics Unit, School of Clinical Medicine, University of CambridgeAbstract Background Mendelian randomization is a popular method for causal inference with observational data that uses genetic variants as instrumental variables. Similarly to a randomized trial, a standard Mendelian randomization analysis estimates the population-averaged effect of an exposure on an outcome. Dividing the population into subgroups can reveal effect heterogeneity to inform who would most benefit from intervention on the exposure. However, as covariates are measured post-“randomization”, naive stratification typically induces collider bias in stratum-specific estimates. Method We extend a previously proposed stratification method (the “doubly-ranked method”) to form strata based on a single covariate, and introduce a data-adaptive random forest method to calculate stratum-specific estimates that are robust to collider bias based on a high-dimensional covariate set. We also propose measures based on the Q statistic to assess heterogeneity between stratum-specific estimates (to understand whether estimates are more variable than expected due to chance alone) and variable importance (to identify the key drivers of effect heterogeneity). Result We show that the effect of body mass index (BMI) on lung function is heterogeneous, depending most strongly on hip circumference and weight. While for most individuals, the predicted effect of increasing BMI on lung function is negative, it is positive for some individuals and strongly negative for others. Conclusion Our data-adaptive approach allows for the exploration of effect heterogeneity in the relationship between an exposure and an outcome within a Mendelian randomization framework. This can yield valuable insights into disease aetiology and help identify specific groups of individuals who would derive the greatest benefit from targeted interventions on the exposure.https://doi.org/10.1186/s12874-024-02153-1GeneticsInstrumental variableStratificationHeterogenous effectRandom forestVariable importance
spellingShingle	Haodong Tian Brian D. M. Tom Stephen Burgess A data-adaptive method for investigating effect heterogeneity with high-dimensional covariates in Mendelian randomization BMC Medical Research Methodology Genetics Instrumental variable Stratification Heterogenous effect Random forest Variable importance
title	A data-adaptive method for investigating effect heterogeneity with high-dimensional covariates in Mendelian randomization
title_full	A data-adaptive method for investigating effect heterogeneity with high-dimensional covariates in Mendelian randomization
title_fullStr	A data-adaptive method for investigating effect heterogeneity with high-dimensional covariates in Mendelian randomization
title_full_unstemmed	A data-adaptive method for investigating effect heterogeneity with high-dimensional covariates in Mendelian randomization
title_short	A data-adaptive method for investigating effect heterogeneity with high-dimensional covariates in Mendelian randomization
title_sort	data adaptive method for investigating effect heterogeneity with high dimensional covariates in mendelian randomization
topic	Genetics Instrumental variable Stratification Heterogenous effect Random forest Variable importance
url	https://doi.org/10.1186/s12874-024-02153-1
work_keys_str_mv	AT haodongtian adataadaptivemethodforinvestigatingeffectheterogeneitywithhighdimensionalcovariatesinmendelianrandomization AT briandmtom adataadaptivemethodforinvestigatingeffectheterogeneitywithhighdimensionalcovariatesinmendelianrandomization AT stephenburgess adataadaptivemethodforinvestigatingeffectheterogeneitywithhighdimensionalcovariatesinmendelianrandomization AT haodongtian dataadaptivemethodforinvestigatingeffectheterogeneitywithhighdimensionalcovariatesinmendelianrandomization AT briandmtom dataadaptivemethodforinvestigatingeffectheterogeneitywithhighdimensionalcovariatesinmendelianrandomization AT stephenburgess dataadaptivemethodforinvestigatingeffectheterogeneitywithhighdimensionalcovariatesinmendelianrandomization

A data-adaptive method for investigating effect heterogeneity with high-dimensional covariates in Mendelian randomization

Similar Items