PanDelos: a dictionary-based method for pan-genome content discovery

Abstract Background Pan-genome approaches afford the discovery of homology relations in a set of genomes, by determining how some gene families are distributed among a given set of genomes. The retrieval of a complete gene distribution among a class of genomes is an NP-hard problem because computati...

Full description

Bibliographic Details
Main Authors:	Vincenzo Bonnici, Rosalba Giugno, Vincenzo Manca
Format:	Article
Language:	English
Published:	BMC 2018-11-01
Series:	BMC Bioinformatics
Subjects:	Pan-genome Distant genomes k-mer dictionary
Online Access:	http://link.springer.com/article/10.1186/s12859-018-2417-6

_version_	1819114411952963584
author	Vincenzo Bonnici Rosalba Giugno Vincenzo Manca
author_facet	Vincenzo Bonnici Rosalba Giugno Vincenzo Manca
author_sort	Vincenzo Bonnici
collection	DOAJ
description	Abstract Background Pan-genome approaches afford the discovery of homology relations in a set of genomes, by determining how some gene families are distributed among a given set of genomes. The retrieval of a complete gene distribution among a class of genomes is an NP-hard problem because computational costs increase with the number of analyzed genomes, in fact, all-against-all gene comparisons are required to completely solve the problem. In presence of phylogenetically distant genomes, due to the variability introduced in gene duplication and transmission, the task of recognizing homologous genes becomes even more difficult. A challenge on this field is that of designing fast and adaptive similarity measures in order to find a suitable pan-genome structure of homology relations. Results We present PanDelos, a stand alone tool for the discovery of pan-genome contents among phylogenetic distant genomes. The methodology is based on information theory and network analysis. It is parameter-free because thresholds are automatically deduced from the context. PanDelos avoids sequence alignment by introducing a measure based on k-mer multiplicity. The k-mer length is defined according to general arguments rather than empirical considerations. Homology candidate relations are integrated into a global network and groups of homologous genes are extracted by applying a community detection algorithm. Conclusions PanDelos outperforms existing approaches, Roary and EDGAR, in terms of running times and quality content discovery. Tests were run on collections of real genomes, previously used in analogous studies, and in synthetic benchmarks that represent fully trusted golden truth. The software is available at https://github.com/GiugnoLab/PanDelos.
first_indexed	2024-12-22T04:44:53Z
format	Article
id	doaj.art-8fea1f2d13f94465b3ee4f6fca7eb811
institution	Directory Open Access Journal
issn	1471-2105
language	English
last_indexed	2024-12-22T04:44:53Z
publishDate	2018-11-01
publisher	BMC
record_format	Article
series	BMC Bioinformatics
spelling	doaj.art-8fea1f2d13f94465b3ee4f6fca7eb8112022-12-21T18:38:38ZengBMCBMC Bioinformatics1471-21052018-11-0119S15475910.1186/s12859-018-2417-6PanDelos: a dictionary-based method for pan-genome content discoveryVincenzo Bonnici0Rosalba Giugno1Vincenzo Manca2Department of Computer Science, University of VeronaDepartment of Computer Science, University of VeronaDepartment of Computer Science, University of VeronaAbstract Background Pan-genome approaches afford the discovery of homology relations in a set of genomes, by determining how some gene families are distributed among a given set of genomes. The retrieval of a complete gene distribution among a class of genomes is an NP-hard problem because computational costs increase with the number of analyzed genomes, in fact, all-against-all gene comparisons are required to completely solve the problem. In presence of phylogenetically distant genomes, due to the variability introduced in gene duplication and transmission, the task of recognizing homologous genes becomes even more difficult. A challenge on this field is that of designing fast and adaptive similarity measures in order to find a suitable pan-genome structure of homology relations. Results We present PanDelos, a stand alone tool for the discovery of pan-genome contents among phylogenetic distant genomes. The methodology is based on information theory and network analysis. It is parameter-free because thresholds are automatically deduced from the context. PanDelos avoids sequence alignment by introducing a measure based on k-mer multiplicity. The k-mer length is defined according to general arguments rather than empirical considerations. Homology candidate relations are integrated into a global network and groups of homologous genes are extracted by applying a community detection algorithm. Conclusions PanDelos outperforms existing approaches, Roary and EDGAR, in terms of running times and quality content discovery. Tests were run on collections of real genomes, previously used in analogous studies, and in synthetic benchmarks that represent fully trusted golden truth. The software is available at https://github.com/GiugnoLab/PanDelos.http://link.springer.com/article/10.1186/s12859-018-2417-6Pan-genomeDistant genomesk-mer dictionary
spellingShingle	Vincenzo Bonnici Rosalba Giugno Vincenzo Manca PanDelos: a dictionary-based method for pan-genome content discovery BMC Bioinformatics Pan-genome Distant genomes k-mer dictionary
title	PanDelos: a dictionary-based method for pan-genome content discovery
title_full	PanDelos: a dictionary-based method for pan-genome content discovery
title_fullStr	PanDelos: a dictionary-based method for pan-genome content discovery
title_full_unstemmed	PanDelos: a dictionary-based method for pan-genome content discovery
title_short	PanDelos: a dictionary-based method for pan-genome content discovery
title_sort	pandelos a dictionary based method for pan genome content discovery
topic	Pan-genome Distant genomes k-mer dictionary
url	http://link.springer.com/article/10.1186/s12859-018-2417-6
work_keys_str_mv	AT vincenzobonnici pandelosadictionarybasedmethodforpangenomecontentdiscovery AT rosalbagiugno pandelosadictionarybasedmethodforpangenomecontentdiscovery AT vincenzomanca pandelosadictionarybasedmethodforpangenomecontentdiscovery

PanDelos: a dictionary-based method for pan-genome content discovery

Similar Items