Optimized phylogenetic clustering of HIV-1 sequence data for public health applications.

Clusters of genetically similar infections suggest rapid transmission and may indicate priorities for public health action or reveal underlying epidemiological processes. However, clusters often require user-defined thresholds and are sensitive to non-epidemiological factors, such as non-random samp...

Full description

Bibliographic Details
Main Authors:	Connor Chato, Yi Feng, Yuhua Ruan, Hui Xing, Joshua Herbeck, Marcia Kalish, Art F Y Poon
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2022-11-01
Series:	PLoS Computational Biology
Online Access:	https://doi.org/10.1371/journal.pcbi.1010745

_version_	1797788319934840832
author	Connor Chato Yi Feng Yuhua Ruan Hui Xing Joshua Herbeck Marcia Kalish Art F Y Poon
author_facet	Connor Chato Yi Feng Yuhua Ruan Hui Xing Joshua Herbeck Marcia Kalish Art F Y Poon
author_sort	Connor Chato
collection	DOAJ
description	Clusters of genetically similar infections suggest rapid transmission and may indicate priorities for public health action or reveal underlying epidemiological processes. However, clusters often require user-defined thresholds and are sensitive to non-epidemiological factors, such as non-random sampling. Consequently the ideal threshold for public health applications varies substantially across settings. Here, we show a method which selects optimal thresholds for phylogenetic (subset tree) clustering based on population. We evaluated this method on HIV-1 pol datasets (n = 14, 221 sequences) from four sites in USA (Tennessee, Washington), Canada (Northern Alberta) and China (Beijing). Clusters were defined by tips descending from an ancestral node (with a minimum bootstrap support of 95%) through a series of branches, each with a length below a given threshold. Next, we used pplacer to graft new cases to the fixed tree by maximum likelihood. We evaluated the effect of varying branch-length thresholds on cluster growth as a count outcome by fitting two Poisson regression models: a null model that predicts growth from cluster size, and an alternative model that includes mean collection date as an additional covariate. The alternative model was favoured by AIC across most thresholds, with optimal (greatest difference in AIC) thresholds ranging 0.007-0.013 across sites. The range of optimal thresholds was more variable when re-sampling 80% of the data by location (IQR 0.008 - 0.016, n = 100 replicates). Our results use prospective phylogenetic cluster growth and suggest that there is more variation in effective thresholds for public health than those typically used in clustering studies.
first_indexed	2024-03-13T01:33:54Z
format	Article
id	doaj.art-67112e635a2a48d4927fd8683a38d966
institution	Directory Open Access Journal
issn	1553-734X 1553-7358
language	English
last_indexed	2024-03-13T01:33:54Z
publishDate	2022-11-01
publisher	Public Library of Science (PLoS)
record_format	Article
series	PLoS Computational Biology
spelling	doaj.art-67112e635a2a48d4927fd8683a38d9662023-07-04T05:31:28ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582022-11-011811e101074510.1371/journal.pcbi.1010745Optimized phylogenetic clustering of HIV-1 sequence data for public health applications.Connor ChatoYi FengYuhua RuanHui XingJoshua HerbeckMarcia KalishArt F Y PoonClusters of genetically similar infections suggest rapid transmission and may indicate priorities for public health action or reveal underlying epidemiological processes. However, clusters often require user-defined thresholds and are sensitive to non-epidemiological factors, such as non-random sampling. Consequently the ideal threshold for public health applications varies substantially across settings. Here, we show a method which selects optimal thresholds for phylogenetic (subset tree) clustering based on population. We evaluated this method on HIV-1 pol datasets (n = 14, 221 sequences) from four sites in USA (Tennessee, Washington), Canada (Northern Alberta) and China (Beijing). Clusters were defined by tips descending from an ancestral node (with a minimum bootstrap support of 95%) through a series of branches, each with a length below a given threshold. Next, we used pplacer to graft new cases to the fixed tree by maximum likelihood. We evaluated the effect of varying branch-length thresholds on cluster growth as a count outcome by fitting two Poisson regression models: a null model that predicts growth from cluster size, and an alternative model that includes mean collection date as an additional covariate. The alternative model was favoured by AIC across most thresholds, with optimal (greatest difference in AIC) thresholds ranging 0.007-0.013 across sites. The range of optimal thresholds was more variable when re-sampling 80% of the data by location (IQR 0.008 - 0.016, n = 100 replicates). Our results use prospective phylogenetic cluster growth and suggest that there is more variation in effective thresholds for public health than those typically used in clustering studies.https://doi.org/10.1371/journal.pcbi.1010745
spellingShingle	Connor Chato Yi Feng Yuhua Ruan Hui Xing Joshua Herbeck Marcia Kalish Art F Y Poon Optimized phylogenetic clustering of HIV-1 sequence data for public health applications. PLoS Computational Biology
title	Optimized phylogenetic clustering of HIV-1 sequence data for public health applications.
title_full	Optimized phylogenetic clustering of HIV-1 sequence data for public health applications.
title_fullStr	Optimized phylogenetic clustering of HIV-1 sequence data for public health applications.
title_full_unstemmed	Optimized phylogenetic clustering of HIV-1 sequence data for public health applications.
title_short	Optimized phylogenetic clustering of HIV-1 sequence data for public health applications.
title_sort	optimized phylogenetic clustering of hiv 1 sequence data for public health applications
url	https://doi.org/10.1371/journal.pcbi.1010745
work_keys_str_mv	AT connorchato optimizedphylogeneticclusteringofhiv1sequencedataforpublichealthapplications AT yifeng optimizedphylogeneticclusteringofhiv1sequencedataforpublichealthapplications AT yuhuaruan optimizedphylogeneticclusteringofhiv1sequencedataforpublichealthapplications AT huixing optimizedphylogeneticclusteringofhiv1sequencedataforpublichealthapplications AT joshuaherbeck optimizedphylogeneticclusteringofhiv1sequencedataforpublichealthapplications AT marciakalish optimizedphylogeneticclusteringofhiv1sequencedataforpublichealthapplications AT artfypoon optimizedphylogeneticclusteringofhiv1sequencedataforpublichealthapplications

Optimized phylogenetic clustering of HIV-1 sequence data for public health applications.

Similar Items