Using genetic data to identify transmission risk factors: Statistical assessment and application to tuberculosis transmission.

Identifying host factors that influence infectious disease transmission is an important step toward developing interventions to reduce disease incidence. Recent advances in methods for reconstructing infectious disease transmission events using pathogen genomic and epidemiological data open the door...

Full description

Bibliographic Details
Main Authors: Isaac H Goldstein, Damon Bayer, Ivan Barilar, Balladiah Kizito, Ogopotse Matsiri, Chawangwa Modongo, Nicola M Zetola, Stefan Niemann, Volodymyr M Minin, Sanghyuk S Shin
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2022-12-01
Series:PLoS Computational Biology
Online Access:https://doi.org/10.1371/journal.pcbi.1010696
_version_ 1797974028597592064
author Isaac H Goldstein
Damon Bayer
Ivan Barilar
Balladiah Kizito
Ogopotse Matsiri
Chawangwa Modongo
Nicola M Zetola
Stefan Niemann
Volodymyr M Minin
Sanghyuk S Shin
author_facet Isaac H Goldstein
Damon Bayer
Ivan Barilar
Balladiah Kizito
Ogopotse Matsiri
Chawangwa Modongo
Nicola M Zetola
Stefan Niemann
Volodymyr M Minin
Sanghyuk S Shin
author_sort Isaac H Goldstein
collection DOAJ
description Identifying host factors that influence infectious disease transmission is an important step toward developing interventions to reduce disease incidence. Recent advances in methods for reconstructing infectious disease transmission events using pathogen genomic and epidemiological data open the door for investigation of host factors that affect onward transmission. While most transmission reconstruction methods are designed to work with densely sampled outbreaks, these methods are making their way into surveillance studies, where the fraction of sampled cases with sequenced pathogens could be relatively low. Surveillance studies that use transmission event reconstruction then use the reconstructed events as response variables (i.e., infection source status of each sampled case) and use host characteristics as predictors (e.g., presence of HIV infection) in regression models. We use simulations to study estimation of the effect of a host factor on probability of being an infection source via this multi-step inferential procedure. Using TransPhylo-a widely-used method for Bayesian estimation of infectious disease transmission events-and logistic regression, we find that low sensitivity of identifying infection sources leads to dilution of the signal, biasing logistic regression coefficients toward zero. We show that increasing the proportion of sampled cases improves sensitivity and some, but not all properties of the logistic regression inference. Application of these approaches to real world data from a population-based TB study in Botswana fails to detect an association between HIV infection and probability of being a TB infection source. We conclude that application of a pipeline, where one first uses TransPhylo and sparsely sampled surveillance data to infer transmission events and then estimates effects of host characteristics on probabilities of these events, should be accompanied by a realistic simulation study to better understand biases stemming from imprecise transmission event inference.
first_indexed 2024-04-11T04:13:36Z
format Article
id doaj.art-ff734196f7de4d52b684c0ed1d80a41d
institution Directory Open Access Journal
issn 1553-734X
1553-7358
language English
last_indexed 2024-04-11T04:13:36Z
publishDate 2022-12-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS Computational Biology
spelling doaj.art-ff734196f7de4d52b684c0ed1d80a41d2023-01-01T05:31:09ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582022-12-011812e101069610.1371/journal.pcbi.1010696Using genetic data to identify transmission risk factors: Statistical assessment and application to tuberculosis transmission.Isaac H GoldsteinDamon BayerIvan BarilarBalladiah KizitoOgopotse MatsiriChawangwa ModongoNicola M ZetolaStefan NiemannVolodymyr M MininSanghyuk S ShinIdentifying host factors that influence infectious disease transmission is an important step toward developing interventions to reduce disease incidence. Recent advances in methods for reconstructing infectious disease transmission events using pathogen genomic and epidemiological data open the door for investigation of host factors that affect onward transmission. While most transmission reconstruction methods are designed to work with densely sampled outbreaks, these methods are making their way into surveillance studies, where the fraction of sampled cases with sequenced pathogens could be relatively low. Surveillance studies that use transmission event reconstruction then use the reconstructed events as response variables (i.e., infection source status of each sampled case) and use host characteristics as predictors (e.g., presence of HIV infection) in regression models. We use simulations to study estimation of the effect of a host factor on probability of being an infection source via this multi-step inferential procedure. Using TransPhylo-a widely-used method for Bayesian estimation of infectious disease transmission events-and logistic regression, we find that low sensitivity of identifying infection sources leads to dilution of the signal, biasing logistic regression coefficients toward zero. We show that increasing the proportion of sampled cases improves sensitivity and some, but not all properties of the logistic regression inference. Application of these approaches to real world data from a population-based TB study in Botswana fails to detect an association between HIV infection and probability of being a TB infection source. We conclude that application of a pipeline, where one first uses TransPhylo and sparsely sampled surveillance data to infer transmission events and then estimates effects of host characteristics on probabilities of these events, should be accompanied by a realistic simulation study to better understand biases stemming from imprecise transmission event inference.https://doi.org/10.1371/journal.pcbi.1010696
spellingShingle Isaac H Goldstein
Damon Bayer
Ivan Barilar
Balladiah Kizito
Ogopotse Matsiri
Chawangwa Modongo
Nicola M Zetola
Stefan Niemann
Volodymyr M Minin
Sanghyuk S Shin
Using genetic data to identify transmission risk factors: Statistical assessment and application to tuberculosis transmission.
PLoS Computational Biology
title Using genetic data to identify transmission risk factors: Statistical assessment and application to tuberculosis transmission.
title_full Using genetic data to identify transmission risk factors: Statistical assessment and application to tuberculosis transmission.
title_fullStr Using genetic data to identify transmission risk factors: Statistical assessment and application to tuberculosis transmission.
title_full_unstemmed Using genetic data to identify transmission risk factors: Statistical assessment and application to tuberculosis transmission.
title_short Using genetic data to identify transmission risk factors: Statistical assessment and application to tuberculosis transmission.
title_sort using genetic data to identify transmission risk factors statistical assessment and application to tuberculosis transmission
url https://doi.org/10.1371/journal.pcbi.1010696
work_keys_str_mv AT isaachgoldstein usinggeneticdatatoidentifytransmissionriskfactorsstatisticalassessmentandapplicationtotuberculosistransmission
AT damonbayer usinggeneticdatatoidentifytransmissionriskfactorsstatisticalassessmentandapplicationtotuberculosistransmission
AT ivanbarilar usinggeneticdatatoidentifytransmissionriskfactorsstatisticalassessmentandapplicationtotuberculosistransmission
AT balladiahkizito usinggeneticdatatoidentifytransmissionriskfactorsstatisticalassessmentandapplicationtotuberculosistransmission
AT ogopotsematsiri usinggeneticdatatoidentifytransmissionriskfactorsstatisticalassessmentandapplicationtotuberculosistransmission
AT chawangwamodongo usinggeneticdatatoidentifytransmissionriskfactorsstatisticalassessmentandapplicationtotuberculosistransmission
AT nicolamzetola usinggeneticdatatoidentifytransmissionriskfactorsstatisticalassessmentandapplicationtotuberculosistransmission
AT stefanniemann usinggeneticdatatoidentifytransmissionriskfactorsstatisticalassessmentandapplicationtotuberculosistransmission
AT volodymyrmminin usinggeneticdatatoidentifytransmissionriskfactorsstatisticalassessmentandapplicationtotuberculosistransmission
AT sanghyuksshin usinggeneticdatatoidentifytransmissionriskfactorsstatisticalassessmentandapplicationtotuberculosistransmission