PanGraph: scalable bacterial pan-genome graph construction
The genomic diversity of microbes is commonly parameterized as SNPs relative to a reference genome of a well-characterized, but arbitrary, isolate. However, any reference genome contains only a fraction of the microbial pangenome, the total set of genes observed in a given species. Reference-based a...
Main Authors: | , , , |
---|---|
Format: | Journal article |
Language: | English |
Published: |
Microbiology Society
2023
|
_version_ | 1797110864905502720 |
---|---|
author | Noll, N Molari, M Shaw, LP Neher, RA |
author_facet | Noll, N Molari, M Shaw, LP Neher, RA |
author_sort | Noll, N |
collection | OXFORD |
description | The genomic diversity of microbes is commonly parameterized as SNPs relative to a reference genome of a well-characterized, but arbitrary, isolate. However, any reference genome contains only a fraction of the microbial pangenome, the total set of genes observed in a given species. Reference-based approaches are thus blind to the dynamics of the accessory genome, as well as variation within gene order and copy number. With the widespread usage of long-read sequencing, the number of high-quality, complete genome assemblies has increased dramatically. In addition to pangenomic approaches that focus on the variation in the sets of genes present in different genomes, complete assemblies allow investigations of the evolution of genome structure and gene order. This latter problem, however, is computationally demanding with few tools available that shed light on these dynamics. Here, we present PanGraph, a Julia-based library and command line interface for aligning whole genomes into a graph. Each genome is represented as a path along vertices, which in turn encapsulate homologous multiple sequence alignments. The resultant data structure succinctly summarizes population-level nucleotide and structural polymorphisms and can be exported into several common formats for either downstream analysis or immediate visualization. |
first_indexed | 2024-03-07T08:00:41Z |
format | Journal article |
id | oxford-uuid:8de0af73-7382-46ac-aef6-0843f1173d22 |
institution | University of Oxford |
language | English |
last_indexed | 2024-03-07T08:00:41Z |
publishDate | 2023 |
publisher | Microbiology Society |
record_format | dspace |
spelling | oxford-uuid:8de0af73-7382-46ac-aef6-0843f1173d222023-09-21T10:20:02ZPanGraph: scalable bacterial pan-genome graph constructionJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:8de0af73-7382-46ac-aef6-0843f1173d22EnglishSymplectic ElementsMicrobiology Society2023Noll, NMolari, MShaw, LPNeher, RAThe genomic diversity of microbes is commonly parameterized as SNPs relative to a reference genome of a well-characterized, but arbitrary, isolate. However, any reference genome contains only a fraction of the microbial pangenome, the total set of genes observed in a given species. Reference-based approaches are thus blind to the dynamics of the accessory genome, as well as variation within gene order and copy number. With the widespread usage of long-read sequencing, the number of high-quality, complete genome assemblies has increased dramatically. In addition to pangenomic approaches that focus on the variation in the sets of genes present in different genomes, complete assemblies allow investigations of the evolution of genome structure and gene order. This latter problem, however, is computationally demanding with few tools available that shed light on these dynamics. Here, we present PanGraph, a Julia-based library and command line interface for aligning whole genomes into a graph. Each genome is represented as a path along vertices, which in turn encapsulate homologous multiple sequence alignments. The resultant data structure succinctly summarizes population-level nucleotide and structural polymorphisms and can be exported into several common formats for either downstream analysis or immediate visualization. |
spellingShingle | Noll, N Molari, M Shaw, LP Neher, RA PanGraph: scalable bacterial pan-genome graph construction |
title | PanGraph: scalable bacterial pan-genome graph construction |
title_full | PanGraph: scalable bacterial pan-genome graph construction |
title_fullStr | PanGraph: scalable bacterial pan-genome graph construction |
title_full_unstemmed | PanGraph: scalable bacterial pan-genome graph construction |
title_short | PanGraph: scalable bacterial pan-genome graph construction |
title_sort | pangraph scalable bacterial pan genome graph construction |
work_keys_str_mv | AT nolln pangraphscalablebacterialpangenomegraphconstruction AT molarim pangraphscalablebacterialpangenomegraphconstruction AT shawlp pangraphscalablebacterialpangenomegraphconstruction AT neherra pangraphscalablebacterialpangenomegraphconstruction |