Optimized phylogenetic clustering of HIV-1 sequence data for public health applications.

Clusters of genetically similar infections suggest rapid transmission and may indicate priorities for public health action or reveal underlying epidemiological processes. However, clusters often require user-defined thresholds and are sensitive to non-epidemiological factors, such as non-random samp...

Full description

Bibliographic Details
Main Authors: Connor Chato, Yi Feng, Yuhua Ruan, Hui Xing, Joshua Herbeck, Marcia Kalish, Art F Y Poon
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2022-11-01
Series:PLoS Computational Biology
Online Access:https://doi.org/10.1371/journal.pcbi.1010745
_version_ 1797788319934840832
author Connor Chato
Yi Feng
Yuhua Ruan
Hui Xing
Joshua Herbeck
Marcia Kalish
Art F Y Poon
author_facet Connor Chato
Yi Feng
Yuhua Ruan
Hui Xing
Joshua Herbeck
Marcia Kalish
Art F Y Poon
author_sort Connor Chato
collection DOAJ
description Clusters of genetically similar infections suggest rapid transmission and may indicate priorities for public health action or reveal underlying epidemiological processes. However, clusters often require user-defined thresholds and are sensitive to non-epidemiological factors, such as non-random sampling. Consequently the ideal threshold for public health applications varies substantially across settings. Here, we show a method which selects optimal thresholds for phylogenetic (subset tree) clustering based on population. We evaluated this method on HIV-1 pol datasets (n = 14, 221 sequences) from four sites in USA (Tennessee, Washington), Canada (Northern Alberta) and China (Beijing). Clusters were defined by tips descending from an ancestral node (with a minimum bootstrap support of 95%) through a series of branches, each with a length below a given threshold. Next, we used pplacer to graft new cases to the fixed tree by maximum likelihood. We evaluated the effect of varying branch-length thresholds on cluster growth as a count outcome by fitting two Poisson regression models: a null model that predicts growth from cluster size, and an alternative model that includes mean collection date as an additional covariate. The alternative model was favoured by AIC across most thresholds, with optimal (greatest difference in AIC) thresholds ranging 0.007-0.013 across sites. The range of optimal thresholds was more variable when re-sampling 80% of the data by location (IQR 0.008 - 0.016, n = 100 replicates). Our results use prospective phylogenetic cluster growth and suggest that there is more variation in effective thresholds for public health than those typically used in clustering studies.
first_indexed 2024-03-13T01:33:54Z
format Article
id doaj.art-67112e635a2a48d4927fd8683a38d966
institution Directory Open Access Journal
issn 1553-734X
1553-7358
language English
last_indexed 2024-03-13T01:33:54Z
publishDate 2022-11-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS Computational Biology
spelling doaj.art-67112e635a2a48d4927fd8683a38d9662023-07-04T05:31:28ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582022-11-011811e101074510.1371/journal.pcbi.1010745Optimized phylogenetic clustering of HIV-1 sequence data for public health applications.Connor ChatoYi FengYuhua RuanHui XingJoshua HerbeckMarcia KalishArt F Y PoonClusters of genetically similar infections suggest rapid transmission and may indicate priorities for public health action or reveal underlying epidemiological processes. However, clusters often require user-defined thresholds and are sensitive to non-epidemiological factors, such as non-random sampling. Consequently the ideal threshold for public health applications varies substantially across settings. Here, we show a method which selects optimal thresholds for phylogenetic (subset tree) clustering based on population. We evaluated this method on HIV-1 pol datasets (n = 14, 221 sequences) from four sites in USA (Tennessee, Washington), Canada (Northern Alberta) and China (Beijing). Clusters were defined by tips descending from an ancestral node (with a minimum bootstrap support of 95%) through a series of branches, each with a length below a given threshold. Next, we used pplacer to graft new cases to the fixed tree by maximum likelihood. We evaluated the effect of varying branch-length thresholds on cluster growth as a count outcome by fitting two Poisson regression models: a null model that predicts growth from cluster size, and an alternative model that includes mean collection date as an additional covariate. The alternative model was favoured by AIC across most thresholds, with optimal (greatest difference in AIC) thresholds ranging 0.007-0.013 across sites. The range of optimal thresholds was more variable when re-sampling 80% of the data by location (IQR 0.008 - 0.016, n = 100 replicates). Our results use prospective phylogenetic cluster growth and suggest that there is more variation in effective thresholds for public health than those typically used in clustering studies.https://doi.org/10.1371/journal.pcbi.1010745
spellingShingle Connor Chato
Yi Feng
Yuhua Ruan
Hui Xing
Joshua Herbeck
Marcia Kalish
Art F Y Poon
Optimized phylogenetic clustering of HIV-1 sequence data for public health applications.
PLoS Computational Biology
title Optimized phylogenetic clustering of HIV-1 sequence data for public health applications.
title_full Optimized phylogenetic clustering of HIV-1 sequence data for public health applications.
title_fullStr Optimized phylogenetic clustering of HIV-1 sequence data for public health applications.
title_full_unstemmed Optimized phylogenetic clustering of HIV-1 sequence data for public health applications.
title_short Optimized phylogenetic clustering of HIV-1 sequence data for public health applications.
title_sort optimized phylogenetic clustering of hiv 1 sequence data for public health applications
url https://doi.org/10.1371/journal.pcbi.1010745
work_keys_str_mv AT connorchato optimizedphylogeneticclusteringofhiv1sequencedataforpublichealthapplications
AT yifeng optimizedphylogeneticclusteringofhiv1sequencedataforpublichealthapplications
AT yuhuaruan optimizedphylogeneticclusteringofhiv1sequencedataforpublichealthapplications
AT huixing optimizedphylogeneticclusteringofhiv1sequencedataforpublichealthapplications
AT joshuaherbeck optimizedphylogeneticclusteringofhiv1sequencedataforpublichealthapplications
AT marciakalish optimizedphylogeneticclusteringofhiv1sequencedataforpublichealthapplications
AT artfypoon optimizedphylogeneticclusteringofhiv1sequencedataforpublichealthapplications