seq-seq-pan: building a computational pan-genome data structure on whole genome alignment

Abstract Background The increasing application of next generation sequencing technologies has led to the availability of thousands of reference genomes, often providing multiple genomes for the same or closely related species. The current approach to represent a species or a population with a single...

Full description

Bibliographic Details
Main Authors:	Christine Jandrasits, Piotr W. Dabrowski, Stephan Fuchs, Bernhard Y. Renard
Format:	Article
Language:	English
Published:	BMC 2018-01-01
Series:	BMC Genomics
Subjects:	Pan-genome Data structure Whole genome alignment
Online Access:	http://link.springer.com/article/10.1186/s12864-017-4401-3

_version_	1828838878362992640
author	Christine Jandrasits Piotr W. Dabrowski Stephan Fuchs Bernhard Y. Renard
author_facet	Christine Jandrasits Piotr W. Dabrowski Stephan Fuchs Bernhard Y. Renard
author_sort	Christine Jandrasits
collection	DOAJ
description	Abstract Background The increasing application of next generation sequencing technologies has led to the availability of thousands of reference genomes, often providing multiple genomes for the same or closely related species. The current approach to represent a species or a population with a single reference sequence and a set of variations cannot represent their full diversity and introduces bias towards the chosen reference. There is a need for the representation of multiple sequences in a composite way that is compatible with existing data sources for annotation and suitable for established sequence analysis methods. At the same time, this representation needs to be easily accessible and extendable to account for the constant change of available genomes. Results We introduce seq-seq-pan, a framework that provides methods for adding or removing new genomes from a set of aligned genomes and uses these to construct a whole genome alignment. Throughout the sequential workflow the alignment is optimized for generating a representative linear presentation of the aligned set of genomes, that enables its usage for annotation and in downstream analyses. Conclusions By providing dynamic updates and optimized processing, our approach enables the usage of whole genome alignment in the field of pan-genomics. In addition, the sequential workflow can be used as a fast alternative to existing whole genome aligners for aligning closely related genomes. seq-seq-pan is freely available at https://gitlab.com/rki_bioinformatics
first_indexed	2024-12-12T19:12:05Z
format	Article
id	doaj.art-5f7883cf9b854d52867d119c86ddf8a3
institution	Directory Open Access Journal
issn	1471-2164
language	English
last_indexed	2024-12-12T19:12:05Z
publishDate	2018-01-01
publisher	BMC
record_format	Article
series	BMC Genomics
spelling	doaj.art-5f7883cf9b854d52867d119c86ddf8a32022-12-22T00:14:49ZengBMCBMC Genomics1471-21642018-01-0119111210.1186/s12864-017-4401-3seq-seq-pan: building a computational pan-genome data structure on whole genome alignmentChristine Jandrasits0Piotr W. Dabrowski1Stephan Fuchs2Bernhard Y. Renard3Robert Koch InstituteRobert Koch InstituteRobert Koch Institute, Wernigerode BranchRobert Koch InstituteAbstract Background The increasing application of next generation sequencing technologies has led to the availability of thousands of reference genomes, often providing multiple genomes for the same or closely related species. The current approach to represent a species or a population with a single reference sequence and a set of variations cannot represent their full diversity and introduces bias towards the chosen reference. There is a need for the representation of multiple sequences in a composite way that is compatible with existing data sources for annotation and suitable for established sequence analysis methods. At the same time, this representation needs to be easily accessible and extendable to account for the constant change of available genomes. Results We introduce seq-seq-pan, a framework that provides methods for adding or removing new genomes from a set of aligned genomes and uses these to construct a whole genome alignment. Throughout the sequential workflow the alignment is optimized for generating a representative linear presentation of the aligned set of genomes, that enables its usage for annotation and in downstream analyses. Conclusions By providing dynamic updates and optimized processing, our approach enables the usage of whole genome alignment in the field of pan-genomics. In addition, the sequential workflow can be used as a fast alternative to existing whole genome aligners for aligning closely related genomes. seq-seq-pan is freely available at https://gitlab.com/rki_bioinformaticshttp://link.springer.com/article/10.1186/s12864-017-4401-3Pan-genomeData structureWhole genome alignment
spellingShingle	Christine Jandrasits Piotr W. Dabrowski Stephan Fuchs Bernhard Y. Renard seq-seq-pan: building a computational pan-genome data structure on whole genome alignment BMC Genomics Pan-genome Data structure Whole genome alignment
title	seq-seq-pan: building a computational pan-genome data structure on whole genome alignment
title_full	seq-seq-pan: building a computational pan-genome data structure on whole genome alignment
title_fullStr	seq-seq-pan: building a computational pan-genome data structure on whole genome alignment
title_full_unstemmed	seq-seq-pan: building a computational pan-genome data structure on whole genome alignment
title_short	seq-seq-pan: building a computational pan-genome data structure on whole genome alignment
title_sort	seq seq pan building a computational pan genome data structure on whole genome alignment
topic	Pan-genome Data structure Whole genome alignment
url	http://link.springer.com/article/10.1186/s12864-017-4401-3
work_keys_str_mv	AT christinejandrasits seqseqpanbuildingacomputationalpangenomedatastructureonwholegenomealignment AT piotrwdabrowski seqseqpanbuildingacomputationalpangenomedatastructureonwholegenomealignment AT stephanfuchs seqseqpanbuildingacomputationalpangenomedatastructureonwholegenomealignment AT bernhardyrenard seqseqpanbuildingacomputationalpangenomedatastructureonwholegenomealignment

seq-seq-pan: building a computational pan-genome data structure on whole genome alignment

Similar Items