Mind your Ps: A probabilistic model to aid the interpretation of molecular epidemiology data

Summary: Background: Assessing relatedness of pathogen sequences in clinical samples is a core goal in molecular epidemiology. Tools for Bayesian analysis of phylogeny, such as the BEAST software package, have been typically used in the analysis of sequence/time data in public health. However, they...

Full description

Bibliographic Details
Main Authors: Ana Raquel Penedos, Aurora Fernández-García, Mihaela Lazar, Kajal Ralh, David Williams, Kevin E. Brown
Format: Article
Language:English
Published: Elsevier 2022-05-01
Series:EBioMedicine
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2352396422001736
_version_ 1819062402902130688
author Ana Raquel Penedos
Aurora Fernández-García
Mihaela Lazar
Kajal Ralh
David Williams
Kevin E. Brown
author_facet Ana Raquel Penedos
Aurora Fernández-García
Mihaela Lazar
Kajal Ralh
David Williams
Kevin E. Brown
author_sort Ana Raquel Penedos
collection DOAJ
description Summary: Background: Assessing relatedness of pathogen sequences in clinical samples is a core goal in molecular epidemiology. Tools for Bayesian analysis of phylogeny, such as the BEAST software package, have been typically used in the analysis of sequence/time data in public health. However, they are computationally-, time-, and knowledge-intensive, demanding resources that many laboratories do not have available or cannot allocate frequently. Methods: To evaluate a faster and simpler alternative method to support the routine interpretation of sequence data for epidemiology, we obtained sequences for two regions in the measles virus genome, N-450 and MF-NCR, from patient samples of genotypes B3, D4 and D8 taken between 2011 and 2017 in the UK and Romania. A mathematical model incorporating time, possible shared ancestry and the Poisson distribution describing the number of expected substitutions at a given time point was developed to exclude epidemiological relatedness between pairs of sequences. The model was validated against the commonly used Bayesian phylogenetic method using an independent dataset collected in 2017–19. Findings: We demonstrate that our model, using time and sequence information to predict whether two samples may be related within a given time frame, minimises the risk of erroneous exclusion of relatedness. An easy-to-use implementation in the form of a guide and spreadsheet is provided for convenient application. Interpretation: The proposed model only requires a previously calculated substitution rate for the locus and pathogen of interest. It allows for an informed but quick decision on the likelihood of relatedness between two samples within a time frame, without the need for phylogenetic reconstruction, thus facilitating rapid epidemiological interpretation of sequence data. Funding: This work was funded by the United Kingdom Health Security Agency (UKHSA). The World Health Organization European Regional Office funded Aurora Fernández-García and Mihaela Lazar training visits to UKHSA.
first_indexed 2024-12-21T14:58:13Z
format Article
id doaj.art-7e26c8cbc6f94acc815e2513da73eecd
institution Directory Open Access Journal
issn 2352-3964
language English
last_indexed 2024-12-21T14:58:13Z
publishDate 2022-05-01
publisher Elsevier
record_format Article
series EBioMedicine
spelling doaj.art-7e26c8cbc6f94acc815e2513da73eecd2022-12-21T18:59:41ZengElsevierEBioMedicine2352-39642022-05-0179103989Mind your Ps: A probabilistic model to aid the interpretation of molecular epidemiology dataAna Raquel Penedos0Aurora Fernández-García1Mihaela Lazar2Kajal Ralh3David Williams4Kevin E. Brown5Virus Reference Department, United Kingdom Health Security Agency, London NW9 5EQ, United Kingdom; Corresponding author.National Reference Laboratory for Measles and Rubella, Centro Nacional de Microbiología, Instituto de Salud Carlos III, Madrid, Majadahonda, Spain; CIBER de Epidemiología y Salud Pública (CIBERESP), Madrid, SpainCantacuzino, National Military-Medical Institute for Research and Development, Bucharest, RomaniaVirus Reference Department, United Kingdom Health Security Agency, London NW9 5EQ, United KingdomVirus Reference Department, United Kingdom Health Security Agency, London NW9 5EQ, United KingdomVirus Reference Department, United Kingdom Health Security Agency, London NW9 5EQ, United Kingdom; Immunisation and Countermeasures, United Kingdom Health Security Agency, London NW9 5EQ, United KingdomSummary: Background: Assessing relatedness of pathogen sequences in clinical samples is a core goal in molecular epidemiology. Tools for Bayesian analysis of phylogeny, such as the BEAST software package, have been typically used in the analysis of sequence/time data in public health. However, they are computationally-, time-, and knowledge-intensive, demanding resources that many laboratories do not have available or cannot allocate frequently. Methods: To evaluate a faster and simpler alternative method to support the routine interpretation of sequence data for epidemiology, we obtained sequences for two regions in the measles virus genome, N-450 and MF-NCR, from patient samples of genotypes B3, D4 and D8 taken between 2011 and 2017 in the UK and Romania. A mathematical model incorporating time, possible shared ancestry and the Poisson distribution describing the number of expected substitutions at a given time point was developed to exclude epidemiological relatedness between pairs of sequences. The model was validated against the commonly used Bayesian phylogenetic method using an independent dataset collected in 2017–19. Findings: We demonstrate that our model, using time and sequence information to predict whether two samples may be related within a given time frame, minimises the risk of erroneous exclusion of relatedness. An easy-to-use implementation in the form of a guide and spreadsheet is provided for convenient application. Interpretation: The proposed model only requires a previously calculated substitution rate for the locus and pathogen of interest. It allows for an informed but quick decision on the likelihood of relatedness between two samples within a time frame, without the need for phylogenetic reconstruction, thus facilitating rapid epidemiological interpretation of sequence data. Funding: This work was funded by the United Kingdom Health Security Agency (UKHSA). The World Health Organization European Regional Office funded Aurora Fernández-García and Mihaela Lazar training visits to UKHSA.http://www.sciencedirect.com/science/article/pii/S2352396422001736MeaslesOutbreakEliminationEpidemiologyMolecular epidemiologyClinical virology
spellingShingle Ana Raquel Penedos
Aurora Fernández-García
Mihaela Lazar
Kajal Ralh
David Williams
Kevin E. Brown
Mind your Ps: A probabilistic model to aid the interpretation of molecular epidemiology data
EBioMedicine
Measles
Outbreak
Elimination
Epidemiology
Molecular epidemiology
Clinical virology
title Mind your Ps: A probabilistic model to aid the interpretation of molecular epidemiology data
title_full Mind your Ps: A probabilistic model to aid the interpretation of molecular epidemiology data
title_fullStr Mind your Ps: A probabilistic model to aid the interpretation of molecular epidemiology data
title_full_unstemmed Mind your Ps: A probabilistic model to aid the interpretation of molecular epidemiology data
title_short Mind your Ps: A probabilistic model to aid the interpretation of molecular epidemiology data
title_sort mind your ps a probabilistic model to aid the interpretation of molecular epidemiology data
topic Measles
Outbreak
Elimination
Epidemiology
Molecular epidemiology
Clinical virology
url http://www.sciencedirect.com/science/article/pii/S2352396422001736
work_keys_str_mv AT anaraquelpenedos mindyourpsaprobabilisticmodeltoaidtheinterpretationofmolecularepidemiologydata
AT aurorafernandezgarcia mindyourpsaprobabilisticmodeltoaidtheinterpretationofmolecularepidemiologydata
AT mihaelalazar mindyourpsaprobabilisticmodeltoaidtheinterpretationofmolecularepidemiologydata
AT kajalralh mindyourpsaprobabilisticmodeltoaidtheinterpretationofmolecularepidemiologydata
AT davidwilliams mindyourpsaprobabilisticmodeltoaidtheinterpretationofmolecularepidemiologydata
AT kevinebrown mindyourpsaprobabilisticmodeltoaidtheinterpretationofmolecularepidemiologydata