SeroBA: rapid high-throughput serotyping of Streptococcus pneumoniae from whole genome sequence data

Streptococcus pneumoniae is responsible for 240 000–460 000 deaths in children under 5 years of age each year. Accurate identification of pneumococcal serotypes is important for tracking the distribution and evolution of serotypes following the introduction of effective vaccines. Recent efforts have...

Ful tanımlama

Detaylı Bibliyografya
Asıl Yazarlar: Epping, L, Van Tonder, A, Gladstone, R, The Global Pneumococcal Sequencing Consortium, Bentley, S, Page, A, Keane, J, Turner, P
Materyal Türü: Journal article
Dil:English
Baskı/Yayın Bilgisi: Microbiology Society 2018
_version_ 1826278314876076032
author Epping, L
Van Tonder, A
Gladstone, R
The Global Pneumococcal Sequencing Consortium,
Bentley, S
Page, A
Keane, J
Turner, P
author_facet Epping, L
Van Tonder, A
Gladstone, R
The Global Pneumococcal Sequencing Consortium,
Bentley, S
Page, A
Keane, J
Turner, P
author_sort Epping, L
collection OXFORD
description Streptococcus pneumoniae is responsible for 240 000–460 000 deaths in children under 5 years of age each year. Accurate identification of pneumococcal serotypes is important for tracking the distribution and evolution of serotypes following the introduction of effective vaccines. Recent efforts have been made to infer serotypes directly from genomic data but current software approaches are limited and do not scale well. Here, we introduce a novel method, SeroBA, which uses a k-mer approach. We compare SeroBA against real and simulated data and present results on the concordance and computational performance against a validation dataset, the robustness and scalability when analysing a large dataset, and the impact of varying the depth of coverage on sequence-based serotyping. SeroBA can predict serotypes, by identifying the cps locus, directly from raw whole genome sequencing read data with 98 % concordance using a k-mer-based method, can process 10 000 samples in just over 1 day using a standard server and can call serotypes at a coverage as low as 15–21×. SeroBA is implemented in Python3 and is freely available under an open source GPLv3 licence from: https://github.com/sanger-pathogens/seroba
first_indexed 2024-03-06T23:42:05Z
format Journal article
id oxford-uuid:6faaaf5f-a492-4d51-a58d-55302b3b45be
institution University of Oxford
language English
last_indexed 2024-03-06T23:42:05Z
publishDate 2018
publisher Microbiology Society
record_format dspace
spelling oxford-uuid:6faaaf5f-a492-4d51-a58d-55302b3b45be2022-03-26T19:32:05ZSeroBA: rapid high-throughput serotyping of Streptococcus pneumoniae from whole genome sequence dataJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:6faaaf5f-a492-4d51-a58d-55302b3b45beEnglishSymplectic Elements at OxfordMicrobiology Society2018Epping, LVan Tonder, AGladstone, RThe Global Pneumococcal Sequencing Consortium,Bentley, SPage, AKeane, JTurner, PStreptococcus pneumoniae is responsible for 240 000–460 000 deaths in children under 5 years of age each year. Accurate identification of pneumococcal serotypes is important for tracking the distribution and evolution of serotypes following the introduction of effective vaccines. Recent efforts have been made to infer serotypes directly from genomic data but current software approaches are limited and do not scale well. Here, we introduce a novel method, SeroBA, which uses a k-mer approach. We compare SeroBA against real and simulated data and present results on the concordance and computational performance against a validation dataset, the robustness and scalability when analysing a large dataset, and the impact of varying the depth of coverage on sequence-based serotyping. SeroBA can predict serotypes, by identifying the cps locus, directly from raw whole genome sequencing read data with 98 % concordance using a k-mer-based method, can process 10 000 samples in just over 1 day using a standard server and can call serotypes at a coverage as low as 15–21×. SeroBA is implemented in Python3 and is freely available under an open source GPLv3 licence from: https://github.com/sanger-pathogens/seroba
spellingShingle Epping, L
Van Tonder, A
Gladstone, R
The Global Pneumococcal Sequencing Consortium,
Bentley, S
Page, A
Keane, J
Turner, P
SeroBA: rapid high-throughput serotyping of Streptococcus pneumoniae from whole genome sequence data
title SeroBA: rapid high-throughput serotyping of Streptococcus pneumoniae from whole genome sequence data
title_full SeroBA: rapid high-throughput serotyping of Streptococcus pneumoniae from whole genome sequence data
title_fullStr SeroBA: rapid high-throughput serotyping of Streptococcus pneumoniae from whole genome sequence data
title_full_unstemmed SeroBA: rapid high-throughput serotyping of Streptococcus pneumoniae from whole genome sequence data
title_short SeroBA: rapid high-throughput serotyping of Streptococcus pneumoniae from whole genome sequence data
title_sort seroba rapid high throughput serotyping of streptococcus pneumoniae from whole genome sequence data
work_keys_str_mv AT eppingl serobarapidhighthroughputserotypingofstreptococcuspneumoniaefromwholegenomesequencedata
AT vantondera serobarapidhighthroughputserotypingofstreptococcuspneumoniaefromwholegenomesequencedata
AT gladstoner serobarapidhighthroughputserotypingofstreptococcuspneumoniaefromwholegenomesequencedata
AT theglobalpneumococcalsequencingconsortium serobarapidhighthroughputserotypingofstreptococcuspneumoniaefromwholegenomesequencedata
AT bentleys serobarapidhighthroughputserotypingofstreptococcuspneumoniaefromwholegenomesequencedata
AT pagea serobarapidhighthroughputserotypingofstreptococcuspneumoniaefromwholegenomesequencedata
AT keanej serobarapidhighthroughputserotypingofstreptococcuspneumoniaefromwholegenomesequencedata
AT turnerp serobarapidhighthroughputserotypingofstreptococcuspneumoniaefromwholegenomesequencedata