Causal Modeling to Mitigate Selection Bias and Unmeasured Confounding in Internet-Based Epidemiology of COVID-19: Model Development and Validation

BackgroundSelection bias and unmeasured confounding are fundamental problems in epidemiology that threaten study internal and external validity. These phenomena are particularly dangerous in internet-based public health surveillance, where traditional mitigation and adjustmen...

Full description

Bibliographic Details
Main Authors: Nathaniel Stockham, Peter Washington, Brianna Chrisman, Kelley Paskov, Jae-Yoon Jung, Dennis Paul Wall
Format: Article
Language:English
Published: JMIR Publications 2022-07-01
Series:JMIR Public Health and Surveillance
Online Access:https://publichealth.jmir.org/2022/7/e31306
_version_ 1827858147522379776
author Nathaniel Stockham
Peter Washington
Brianna Chrisman
Kelley Paskov
Jae-Yoon Jung
Dennis Paul Wall
author_facet Nathaniel Stockham
Peter Washington
Brianna Chrisman
Kelley Paskov
Jae-Yoon Jung
Dennis Paul Wall
author_sort Nathaniel Stockham
collection DOAJ
description BackgroundSelection bias and unmeasured confounding are fundamental problems in epidemiology that threaten study internal and external validity. These phenomena are particularly dangerous in internet-based public health surveillance, where traditional mitigation and adjustment methods are inapplicable, unavailable, or out of date. Recent theoretical advances in causal modeling can mitigate these threats, but these innovations have not been widely deployed in the epidemiological community. ObjectiveThe purpose of our paper is to demonstrate the practical utility of causal modeling to both detect unmeasured confounding and selection bias and guide model selection to minimize bias. We implemented this approach in an applied epidemiological study of the COVID-19 cumulative infection rate in the New York City (NYC) spring 2020 epidemic. MethodsWe collected primary data from Qualtrics surveys of Amazon Mechanical Turk (MTurk) crowd workers residing in New Jersey and New York State across 2 sampling periods: April 11-14 and May 8-11, 2020. The surveys queried the subjects on household health status and demographic characteristics. We constructed a set of possible causal models of household infection and survey selection mechanisms and ranked them by compatibility with the collected survey data. The most compatible causal model was then used to estimate the cumulative infection rate in each survey period. ResultsThere were 527 and 513 responses collected for the 2 periods, respectively. Response demographics were highly skewed toward a younger age in both survey periods. Despite the extremely strong relationship between age and COVID-19 symptoms, we recovered minimally biased estimates of the cumulative infection rate using only primary data and the most compatible causal model, with a relative bias of +3.8% and –1.9% from the reported cumulative infection rate for the first and second survey periods, respectively. ConclusionsWe successfully recovered accurate estimates of the cumulative infection rate from an internet-based crowdsourced sample despite considerable selection bias and unmeasured confounding in the primary data. This implementation demonstrates how simple applications of structural causal modeling can be effectively used to determine falsifiable model conditions, detect selection bias and confounding factors, and minimize estimate bias through model selection in a novel epidemiological context. As the disease and social dynamics of COVID-19 continue to evolve, public health surveillance protocols must continue to adapt; the emergence of Omicron variants and shift to at-home testing as recent challenges. Rigorous and transparent methods to develop, deploy, and diagnosis adapted surveillance protocols will be critical to their success.
first_indexed 2024-03-12T12:50:39Z
format Article
id doaj.art-839c8f34d70d47f990ed7d35df1f5b24
institution Directory Open Access Journal
issn 2369-2960
language English
last_indexed 2024-03-12T12:50:39Z
publishDate 2022-07-01
publisher JMIR Publications
record_format Article
series JMIR Public Health and Surveillance
spelling doaj.art-839c8f34d70d47f990ed7d35df1f5b242023-08-28T22:44:30ZengJMIR PublicationsJMIR Public Health and Surveillance2369-29602022-07-0187e3130610.2196/31306Causal Modeling to Mitigate Selection Bias and Unmeasured Confounding in Internet-Based Epidemiology of COVID-19: Model Development and ValidationNathaniel Stockhamhttps://orcid.org/0000-0002-0752-6801Peter Washingtonhttps://orcid.org/0000-0003-3276-4411Brianna Chrismanhttps://orcid.org/0000-0002-7157-607XKelley Paskovhttps://orcid.org/0000-0002-5252-1401Jae-Yoon Junghttps://orcid.org/0000-0001-7948-9803Dennis Paul Wallhttps://orcid.org/0000-0002-7889-9146 BackgroundSelection bias and unmeasured confounding are fundamental problems in epidemiology that threaten study internal and external validity. These phenomena are particularly dangerous in internet-based public health surveillance, where traditional mitigation and adjustment methods are inapplicable, unavailable, or out of date. Recent theoretical advances in causal modeling can mitigate these threats, but these innovations have not been widely deployed in the epidemiological community. ObjectiveThe purpose of our paper is to demonstrate the practical utility of causal modeling to both detect unmeasured confounding and selection bias and guide model selection to minimize bias. We implemented this approach in an applied epidemiological study of the COVID-19 cumulative infection rate in the New York City (NYC) spring 2020 epidemic. MethodsWe collected primary data from Qualtrics surveys of Amazon Mechanical Turk (MTurk) crowd workers residing in New Jersey and New York State across 2 sampling periods: April 11-14 and May 8-11, 2020. The surveys queried the subjects on household health status and demographic characteristics. We constructed a set of possible causal models of household infection and survey selection mechanisms and ranked them by compatibility with the collected survey data. The most compatible causal model was then used to estimate the cumulative infection rate in each survey period. ResultsThere were 527 and 513 responses collected for the 2 periods, respectively. Response demographics were highly skewed toward a younger age in both survey periods. Despite the extremely strong relationship between age and COVID-19 symptoms, we recovered minimally biased estimates of the cumulative infection rate using only primary data and the most compatible causal model, with a relative bias of +3.8% and –1.9% from the reported cumulative infection rate for the first and second survey periods, respectively. ConclusionsWe successfully recovered accurate estimates of the cumulative infection rate from an internet-based crowdsourced sample despite considerable selection bias and unmeasured confounding in the primary data. This implementation demonstrates how simple applications of structural causal modeling can be effectively used to determine falsifiable model conditions, detect selection bias and confounding factors, and minimize estimate bias through model selection in a novel epidemiological context. As the disease and social dynamics of COVID-19 continue to evolve, public health surveillance protocols must continue to adapt; the emergence of Omicron variants and shift to at-home testing as recent challenges. Rigorous and transparent methods to develop, deploy, and diagnosis adapted surveillance protocols will be critical to their success.https://publichealth.jmir.org/2022/7/e31306
spellingShingle Nathaniel Stockham
Peter Washington
Brianna Chrisman
Kelley Paskov
Jae-Yoon Jung
Dennis Paul Wall
Causal Modeling to Mitigate Selection Bias and Unmeasured Confounding in Internet-Based Epidemiology of COVID-19: Model Development and Validation
JMIR Public Health and Surveillance
title Causal Modeling to Mitigate Selection Bias and Unmeasured Confounding in Internet-Based Epidemiology of COVID-19: Model Development and Validation
title_full Causal Modeling to Mitigate Selection Bias and Unmeasured Confounding in Internet-Based Epidemiology of COVID-19: Model Development and Validation
title_fullStr Causal Modeling to Mitigate Selection Bias and Unmeasured Confounding in Internet-Based Epidemiology of COVID-19: Model Development and Validation
title_full_unstemmed Causal Modeling to Mitigate Selection Bias and Unmeasured Confounding in Internet-Based Epidemiology of COVID-19: Model Development and Validation
title_short Causal Modeling to Mitigate Selection Bias and Unmeasured Confounding in Internet-Based Epidemiology of COVID-19: Model Development and Validation
title_sort causal modeling to mitigate selection bias and unmeasured confounding in internet based epidemiology of covid 19 model development and validation
url https://publichealth.jmir.org/2022/7/e31306
work_keys_str_mv AT nathanielstockham causalmodelingtomitigateselectionbiasandunmeasuredconfoundingininternetbasedepidemiologyofcovid19modeldevelopmentandvalidation
AT peterwashington causalmodelingtomitigateselectionbiasandunmeasuredconfoundingininternetbasedepidemiologyofcovid19modeldevelopmentandvalidation
AT briannachrisman causalmodelingtomitigateselectionbiasandunmeasuredconfoundingininternetbasedepidemiologyofcovid19modeldevelopmentandvalidation
AT kelleypaskov causalmodelingtomitigateselectionbiasandunmeasuredconfoundingininternetbasedepidemiologyofcovid19modeldevelopmentandvalidation
AT jaeyoonjung causalmodelingtomitigateselectionbiasandunmeasuredconfoundingininternetbasedepidemiologyofcovid19modeldevelopmentandvalidation
AT dennispaulwall causalmodelingtomitigateselectionbiasandunmeasuredconfoundingininternetbasedepidemiologyofcovid19modeldevelopmentandvalidation