Deep clustering of bacterial tree images

The field of genomic epidemiology is rapidly growing as many jurisdictions begin to deploy whole-genome sequencing (WGS) in their national or regional pathogen surveillance programmes. WGS data offer a rich view of the shared ancestry of a set of taxa, typically visualized with phylogenetic trees il...

全面介紹

書目詳細資料
Main Authors: Hayati, M, Chindelevitch, L, Aanensen, D, Colijn, C
格式: Journal article
語言:English
出版: Royal Society 2022
實物特徵
總結:The field of genomic epidemiology is rapidly growing as many jurisdictions begin to deploy whole-genome sequencing (WGS) in their national or regional pathogen surveillance programmes. WGS data offer a rich view of the shared ancestry of a set of taxa, typically visualized with phylogenetic trees illustrating the clusters or subtypes present in a group of taxa, their relatedness and the extent of diversification within and between them. When methicillin-resistant <em>Staphylococcus aureus</em> (MRSA) arose and disseminated widely, phylogenetic trees of MRSA-containing types of <em>S. aureus</em> had a distinctive 'comet' shape, with a 'comet head' of recently adapted drug-resistant isolates in the context of a 'comet tail' that was predominantly drug-sensitive. Placing an <em>S. aureus</em> isolate in the context of such a 'comet' helped public health laboratories interpret local data within the broader setting of <em>S. aureus</em> evolution. In this work, we ask what other tree shapes, analogous to the MRSA comet, are present in bacterial WGS datasets. We extract trees from large bacterial genomic datasets, visualize them as images and cluster the images. We find nine major groups of tree images, including the 'comets', star-like phylogenies, 'barbell' phylogenies and other shapes, and comment on the evolutionary and epidemiological stories these shapes might illustrate. This article is part of a discussion meeting issue 'Genomic population structures of microbial pathogens'.