A gene-by-gene population genomics platform: de novo assembly, annotation and genealogical analysis of 108 representative Neisseria meningitidis genomes.

<h4>Background:</h4> <p>Highly parallel,‘second generation’ sequencing technologies have rapidly expanded the number of bacterial whole genome sequences available for study, permitting the emergence of the discipline of population genomics. Most of these data are publically availab...

Full description

Bibliographic Details
Main Authors: Bratcher, H, Corton, C, Jolley, K, Parkhill, J, Maiden, M
Format: Journal article
Language:English
Published: BioMed Central 2014
_version_ 1826279321771180032
author Bratcher, H
Corton, C
Jolley, K
Parkhill, J
Maiden, M
author_facet Bratcher, H
Corton, C
Jolley, K
Parkhill, J
Maiden, M
author_sort Bratcher, H
collection OXFORD
description <h4>Background:</h4> <p>Highly parallel,‘second generation’ sequencing technologies have rapidly expanded the number of bacterial whole genome sequences available for study, permitting the emergence of the discipline of population genomics. Most of these data are publically available as unassembled short-read sequence files that require extensive processing before they can be used for analysis. The provision of data in a uniform format, which can be easily assessed for quality, linked to provenance and phenotype and used for analysis, is therefore necessary.</p> <h4>Results:</h4> <p>The performance of de novo short-read assembly followed by automatic annotation using the pubMLST. orgNeisseriadatabase was assessed and evaluated for 108 diverse, representative, and well-characterisedNeisseria meningitidisisolates. High-quality sequences were obtained for &gt;99% of known meningococcal genes among the de novoassembled genomes and four resequenced genomes and less than 1% of reassembled genes had sequence discrepancies or misassembled sequences. A core genome of 1600 loci, present in at least 95% of the population, was determined using the Genome Comparator tool. Genealogical relationships compatible with, but at a higher resolution than, those identified by multilocus sequence typing were obtained with core genome comparisons and ribosomal protein gene analysis which revealed a genomic structure for a number of previously described phenotypes. This unified system for cataloguing Neisseria genetic variation in the genome was implemented and used for multiple analyses and the data are publically available in the PubMLST Neisseria database.</p> <h4>Conclusions:</h4> <p>The de novo assembly, combined with automated gene-by-gene annotation, generates high quality draft genomes in which the majority of protein-encoding genes are present with high accuracy. The approach catalogues diversity efficiently, permits analyses of a single genome or multiple genome comparisons, and is a practical approach to interpreting WGS data for large bacterial population samples. The method generates novel insights into the biology of the meningococcus and improves our understanding of the whole population structure, not just disease causing lineages.</p>
first_indexed 2024-03-06T23:56:58Z
format Journal article
id oxford-uuid:748f0f59-3c83-4e77-925e-02862794db89
institution University of Oxford
language English
last_indexed 2024-03-06T23:56:58Z
publishDate 2014
publisher BioMed Central
record_format dspace
spelling oxford-uuid:748f0f59-3c83-4e77-925e-02862794db892022-03-26T20:03:47ZA gene-by-gene population genomics platform: de novo assembly, annotation and genealogical analysis of 108 representative Neisseria meningitidis genomes.Journal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:748f0f59-3c83-4e77-925e-02862794db89EnglishSymplectic Elements at OxfordBioMed Central2014Bratcher, HCorton, CJolley, KParkhill, JMaiden, M<h4>Background:</h4> <p>Highly parallel,‘second generation’ sequencing technologies have rapidly expanded the number of bacterial whole genome sequences available for study, permitting the emergence of the discipline of population genomics. Most of these data are publically available as unassembled short-read sequence files that require extensive processing before they can be used for analysis. The provision of data in a uniform format, which can be easily assessed for quality, linked to provenance and phenotype and used for analysis, is therefore necessary.</p> <h4>Results:</h4> <p>The performance of de novo short-read assembly followed by automatic annotation using the pubMLST. orgNeisseriadatabase was assessed and evaluated for 108 diverse, representative, and well-characterisedNeisseria meningitidisisolates. High-quality sequences were obtained for &gt;99% of known meningococcal genes among the de novoassembled genomes and four resequenced genomes and less than 1% of reassembled genes had sequence discrepancies or misassembled sequences. A core genome of 1600 loci, present in at least 95% of the population, was determined using the Genome Comparator tool. Genealogical relationships compatible with, but at a higher resolution than, those identified by multilocus sequence typing were obtained with core genome comparisons and ribosomal protein gene analysis which revealed a genomic structure for a number of previously described phenotypes. This unified system for cataloguing Neisseria genetic variation in the genome was implemented and used for multiple analyses and the data are publically available in the PubMLST Neisseria database.</p> <h4>Conclusions:</h4> <p>The de novo assembly, combined with automated gene-by-gene annotation, generates high quality draft genomes in which the majority of protein-encoding genes are present with high accuracy. The approach catalogues diversity efficiently, permits analyses of a single genome or multiple genome comparisons, and is a practical approach to interpreting WGS data for large bacterial population samples. The method generates novel insights into the biology of the meningococcus and improves our understanding of the whole population structure, not just disease causing lineages.</p>
spellingShingle Bratcher, H
Corton, C
Jolley, K
Parkhill, J
Maiden, M
A gene-by-gene population genomics platform: de novo assembly, annotation and genealogical analysis of 108 representative Neisseria meningitidis genomes.
title A gene-by-gene population genomics platform: de novo assembly, annotation and genealogical analysis of 108 representative Neisseria meningitidis genomes.
title_full A gene-by-gene population genomics platform: de novo assembly, annotation and genealogical analysis of 108 representative Neisseria meningitidis genomes.
title_fullStr A gene-by-gene population genomics platform: de novo assembly, annotation and genealogical analysis of 108 representative Neisseria meningitidis genomes.
title_full_unstemmed A gene-by-gene population genomics platform: de novo assembly, annotation and genealogical analysis of 108 representative Neisseria meningitidis genomes.
title_short A gene-by-gene population genomics platform: de novo assembly, annotation and genealogical analysis of 108 representative Neisseria meningitidis genomes.
title_sort gene by gene population genomics platform de novo assembly annotation and genealogical analysis of 108 representative neisseria meningitidis genomes
work_keys_str_mv AT bratcherh agenebygenepopulationgenomicsplatformdenovoassemblyannotationandgenealogicalanalysisof108representativeneisseriameningitidisgenomes
AT cortonc agenebygenepopulationgenomicsplatformdenovoassemblyannotationandgenealogicalanalysisof108representativeneisseriameningitidisgenomes
AT jolleyk agenebygenepopulationgenomicsplatformdenovoassemblyannotationandgenealogicalanalysisof108representativeneisseriameningitidisgenomes
AT parkhillj agenebygenepopulationgenomicsplatformdenovoassemblyannotationandgenealogicalanalysisof108representativeneisseriameningitidisgenomes
AT maidenm agenebygenepopulationgenomicsplatformdenovoassemblyannotationandgenealogicalanalysisof108representativeneisseriameningitidisgenomes
AT bratcherh genebygenepopulationgenomicsplatformdenovoassemblyannotationandgenealogicalanalysisof108representativeneisseriameningitidisgenomes
AT cortonc genebygenepopulationgenomicsplatformdenovoassemblyannotationandgenealogicalanalysisof108representativeneisseriameningitidisgenomes
AT jolleyk genebygenepopulationgenomicsplatformdenovoassemblyannotationandgenealogicalanalysisof108representativeneisseriameningitidisgenomes
AT parkhillj genebygenepopulationgenomicsplatformdenovoassemblyannotationandgenealogicalanalysisof108representativeneisseriameningitidisgenomes
AT maidenm genebygenepopulationgenomicsplatformdenovoassemblyannotationandgenealogicalanalysisof108representativeneisseriameningitidisgenomes