ReporTree: a surveillance-oriented tool to strengthen the linkage between pathogen genetic clusters and epidemiological data

Abstract Background Genomics-informed pathogen surveillance strengthens public health decision-making, playing an important role in infectious diseases’ prevention and control. A pivotal outcome of genomics surveillance is the identification of pathogen genetic clusters and their characterization in...

Full description

Bibliographic Details
Main Authors: Verónica Mixão, Miguel Pinto, Daniel Sobral, Adriano Di Pasquale, João Paulo Gomes, Vítor Borges
Format: Article
Language:English
Published: BMC 2023-06-01
Series:Genome Medicine
Subjects:
Online Access:https://doi.org/10.1186/s13073-023-01196-1
_version_ 1797801313716666368
author Verónica Mixão
Miguel Pinto
Daniel Sobral
Adriano Di Pasquale
João Paulo Gomes
Vítor Borges
author_facet Verónica Mixão
Miguel Pinto
Daniel Sobral
Adriano Di Pasquale
João Paulo Gomes
Vítor Borges
author_sort Verónica Mixão
collection DOAJ
description Abstract Background Genomics-informed pathogen surveillance strengthens public health decision-making, playing an important role in infectious diseases’ prevention and control. A pivotal outcome of genomics surveillance is the identification of pathogen genetic clusters and their characterization in terms of geotemporal spread or linkage to clinical and demographic data. This task often consists of the visual exploration of (large) phylogenetic trees and associated metadata, being time-consuming and difficult to reproduce. Results We developed ReporTree, a flexible bioinformatics pipeline that allows diving into the complexity of pathogen diversity to rapidly identify genetic clusters at any (or all) distance threshold(s) or cluster stability regions and to generate surveillance-oriented reports based on the available metadata, such as timespan, geography, or vaccination/clinical status. ReporTree is able to maintain cluster nomenclature in subsequent analyses and to generate a nomenclature code combining cluster information at different hierarchical levels, thus facilitating the active surveillance of clusters of interest. By handling several input formats and clustering methods, ReporTree is applicable to multiple pathogens, constituting a flexible resource that can be smoothly deployed in routine surveillance bioinformatics workflows with negligible computational and time costs. This is demonstrated through a comprehensive benchmarking of (i) the cg/wgMLST workflow with large datasets of four foodborne bacterial pathogens and (ii) the alignment-based SNP workflow with a large dataset of Mycobacterium tuberculosis. To further validate this tool, we reproduced a previous large-scale study on Neisseria gonorrhoeae, demonstrating how ReporTree is able to rapidly identify the main species genogroups and characterize them with key surveillance metadata, such as antibiotic resistance data. By providing examples for SARS-CoV-2 and the foodborne bacterial pathogen Listeria monocytogenes, we show how this tool is currently a useful asset in genomics-informed routine surveillance and outbreak detection of a wide variety of species. Conclusions In summary, ReporTree is a pan-pathogen tool for automated and reproducible identification and characterization of genetic clusters that contributes to a sustainable and efficient public health genomics-informed pathogen surveillance. ReporTree is implemented in python 3.8 and is freely available at https://github.com/insapathogenomics/ReporTree .
first_indexed 2024-03-13T04:48:33Z
format Article
id doaj.art-864de65eb9b24deeb4602c48cd8b88f1
institution Directory Open Access Journal
issn 1756-994X
language English
last_indexed 2024-03-13T04:48:33Z
publishDate 2023-06-01
publisher BMC
record_format Article
series Genome Medicine
spelling doaj.art-864de65eb9b24deeb4602c48cd8b88f12023-06-18T11:21:10ZengBMCGenome Medicine1756-994X2023-06-0115111210.1186/s13073-023-01196-1ReporTree: a surveillance-oriented tool to strengthen the linkage between pathogen genetic clusters and epidemiological dataVerónica Mixão0Miguel Pinto1Daniel Sobral2Adriano Di Pasquale3João Paulo Gomes4Vítor Borges5Genomics and Bioinformatics Unit, Department of Infectious Diseases, National Institute of Health Doutor Ricardo Jorge (INSA)Genomics and Bioinformatics Unit, Department of Infectious Diseases, National Institute of Health Doutor Ricardo Jorge (INSA)Genomics and Bioinformatics Unit, Department of Infectious Diseases, National Institute of Health Doutor Ricardo Jorge (INSA)National Reference Centre (NRC) for Whole Genome Sequencing of Microbial Pathogens: Database and Bioinformatics analysis (GENPAT), Istituto Zooprofilattico Sperimentale Dell’Abruzzo E del Molise “Giuseppe Caporale” (IZSAM)Genomics and Bioinformatics Unit, Department of Infectious Diseases, National Institute of Health Doutor Ricardo Jorge (INSA)Genomics and Bioinformatics Unit, Department of Infectious Diseases, National Institute of Health Doutor Ricardo Jorge (INSA)Abstract Background Genomics-informed pathogen surveillance strengthens public health decision-making, playing an important role in infectious diseases’ prevention and control. A pivotal outcome of genomics surveillance is the identification of pathogen genetic clusters and their characterization in terms of geotemporal spread or linkage to clinical and demographic data. This task often consists of the visual exploration of (large) phylogenetic trees and associated metadata, being time-consuming and difficult to reproduce. Results We developed ReporTree, a flexible bioinformatics pipeline that allows diving into the complexity of pathogen diversity to rapidly identify genetic clusters at any (or all) distance threshold(s) or cluster stability regions and to generate surveillance-oriented reports based on the available metadata, such as timespan, geography, or vaccination/clinical status. ReporTree is able to maintain cluster nomenclature in subsequent analyses and to generate a nomenclature code combining cluster information at different hierarchical levels, thus facilitating the active surveillance of clusters of interest. By handling several input formats and clustering methods, ReporTree is applicable to multiple pathogens, constituting a flexible resource that can be smoothly deployed in routine surveillance bioinformatics workflows with negligible computational and time costs. This is demonstrated through a comprehensive benchmarking of (i) the cg/wgMLST workflow with large datasets of four foodborne bacterial pathogens and (ii) the alignment-based SNP workflow with a large dataset of Mycobacterium tuberculosis. To further validate this tool, we reproduced a previous large-scale study on Neisseria gonorrhoeae, demonstrating how ReporTree is able to rapidly identify the main species genogroups and characterize them with key surveillance metadata, such as antibiotic resistance data. By providing examples for SARS-CoV-2 and the foodborne bacterial pathogen Listeria monocytogenes, we show how this tool is currently a useful asset in genomics-informed routine surveillance and outbreak detection of a wide variety of species. Conclusions In summary, ReporTree is a pan-pathogen tool for automated and reproducible identification and characterization of genetic clusters that contributes to a sustainable and efficient public health genomics-informed pathogen surveillance. ReporTree is implemented in python 3.8 and is freely available at https://github.com/insapathogenomics/ReporTree .https://doi.org/10.1186/s13073-023-01196-1ReporTreeGenetic clusteringGenomic surveillancePublic healthAutomated pipeline
spellingShingle Verónica Mixão
Miguel Pinto
Daniel Sobral
Adriano Di Pasquale
João Paulo Gomes
Vítor Borges
ReporTree: a surveillance-oriented tool to strengthen the linkage between pathogen genetic clusters and epidemiological data
Genome Medicine
ReporTree
Genetic clustering
Genomic surveillance
Public health
Automated pipeline
title ReporTree: a surveillance-oriented tool to strengthen the linkage between pathogen genetic clusters and epidemiological data
title_full ReporTree: a surveillance-oriented tool to strengthen the linkage between pathogen genetic clusters and epidemiological data
title_fullStr ReporTree: a surveillance-oriented tool to strengthen the linkage between pathogen genetic clusters and epidemiological data
title_full_unstemmed ReporTree: a surveillance-oriented tool to strengthen the linkage between pathogen genetic clusters and epidemiological data
title_short ReporTree: a surveillance-oriented tool to strengthen the linkage between pathogen genetic clusters and epidemiological data
title_sort reportree a surveillance oriented tool to strengthen the linkage between pathogen genetic clusters and epidemiological data
topic ReporTree
Genetic clustering
Genomic surveillance
Public health
Automated pipeline
url https://doi.org/10.1186/s13073-023-01196-1
work_keys_str_mv AT veronicamixao reportreeasurveillanceorientedtooltostrengthenthelinkagebetweenpathogengeneticclustersandepidemiologicaldata
AT miguelpinto reportreeasurveillanceorientedtooltostrengthenthelinkagebetweenpathogengeneticclustersandepidemiologicaldata
AT danielsobral reportreeasurveillanceorientedtooltostrengthenthelinkagebetweenpathogengeneticclustersandepidemiologicaldata
AT adrianodipasquale reportreeasurveillanceorientedtooltostrengthenthelinkagebetweenpathogengeneticclustersandepidemiologicaldata
AT joaopaulogomes reportreeasurveillanceorientedtooltostrengthenthelinkagebetweenpathogengeneticclustersandepidemiologicaldata
AT vitorborges reportreeasurveillanceorientedtooltostrengthenthelinkagebetweenpathogengeneticclustersandepidemiologicaldata