Heuristic algorithms for best match graph editing

Abstract Background Best match graphs (BMGs) are a class of colored digraphs that naturally appear in mathematical phylogenetics as a representation of the pairwise most closely related genes among multiple species. An arc connects a gene x with a gene y from another species (vertex color) Y wheneve...

Full description

Bibliographic Details
Main Authors: David Schaller, Manuela Geiß, Marc Hellmuth, Peter F. Stadler
Format: Article
Language:English
Published: BMC 2021-08-01
Series:Algorithms for Molecular Biology
Subjects:
Online Access:https://doi.org/10.1186/s13015-021-00196-3
_version_ 1819139897860030464
author David Schaller
Manuela Geiß
Marc Hellmuth
Peter F. Stadler
author_facet David Schaller
Manuela Geiß
Marc Hellmuth
Peter F. Stadler
author_sort David Schaller
collection DOAJ
description Abstract Background Best match graphs (BMGs) are a class of colored digraphs that naturally appear in mathematical phylogenetics as a representation of the pairwise most closely related genes among multiple species. An arc connects a gene x with a gene y from another species (vertex color) Y whenever it is one of the phylogenetically closest relatives of x. BMGs can be approximated with the help of similarity measures between gene sequences, albeit not without errors. Empirical estimates thus will usually violate the theoretical properties of BMGs. The corresponding graph editing problem can be used to guide error correction for best match data. Since the arc set modification problems for BMGs are NP-complete, efficient heuristics are needed if BMGs are to be used for the practical analysis of biological sequence data. Results Since BMGs have a characterization in terms of consistency of a certain set of rooted triples (binary trees on three vertices) defined on the set of genes, we consider heuristics that operate on triple sets. As an alternative, we show that there is a close connection to a set partitioning problem that leads to a class of top-down recursive algorithms that are similar to Aho’s supertree algorithm and give rise to BMG editing algorithms that are consistent in the sense that they leave BMGs invariant. Extensive benchmarking shows that community detection algorithms for the partitioning steps perform best for BMG editing. Conclusion Noisy BMG data can be corrected with sufficient accuracy and efficiency to make BMGs an attractive alternative to classical phylogenetic methods.
first_indexed 2024-12-22T11:29:58Z
format Article
id doaj.art-0691ed3cc2fd411eb070e18cd3e69ee2
institution Directory Open Access Journal
issn 1748-7188
language English
last_indexed 2024-12-22T11:29:58Z
publishDate 2021-08-01
publisher BMC
record_format Article
series Algorithms for Molecular Biology
spelling doaj.art-0691ed3cc2fd411eb070e18cd3e69ee22022-12-21T18:27:39ZengBMCAlgorithms for Molecular Biology1748-71882021-08-0116113210.1186/s13015-021-00196-3Heuristic algorithms for best match graph editingDavid Schaller0Manuela Geiß1Marc Hellmuth2Peter F. Stadler3Max Planck Institute for Mathematics in the SciencesSoftware Competence Center Hagenberg GmbHDepartment of Mathematics, Faculty of Science, Stockholm UniversityMax Planck Institute for Mathematics in the SciencesAbstract Background Best match graphs (BMGs) are a class of colored digraphs that naturally appear in mathematical phylogenetics as a representation of the pairwise most closely related genes among multiple species. An arc connects a gene x with a gene y from another species (vertex color) Y whenever it is one of the phylogenetically closest relatives of x. BMGs can be approximated with the help of similarity measures between gene sequences, albeit not without errors. Empirical estimates thus will usually violate the theoretical properties of BMGs. The corresponding graph editing problem can be used to guide error correction for best match data. Since the arc set modification problems for BMGs are NP-complete, efficient heuristics are needed if BMGs are to be used for the practical analysis of biological sequence data. Results Since BMGs have a characterization in terms of consistency of a certain set of rooted triples (binary trees on three vertices) defined on the set of genes, we consider heuristics that operate on triple sets. As an alternative, we show that there is a close connection to a set partitioning problem that leads to a class of top-down recursive algorithms that are similar to Aho’s supertree algorithm and give rise to BMG editing algorithms that are consistent in the sense that they leave BMGs invariant. Extensive benchmarking shows that community detection algorithms for the partitioning steps perform best for BMG editing. Conclusion Noisy BMG data can be corrected with sufficient accuracy and efficiency to make BMGs an attractive alternative to classical phylogenetic methods.https://doi.org/10.1186/s13015-021-00196-3Arc modification problemsHeuristic algorithmConsistent algorithmNP-hardness
spellingShingle David Schaller
Manuela Geiß
Marc Hellmuth
Peter F. Stadler
Heuristic algorithms for best match graph editing
Algorithms for Molecular Biology
Arc modification problems
Heuristic algorithm
Consistent algorithm
NP-hardness
title Heuristic algorithms for best match graph editing
title_full Heuristic algorithms for best match graph editing
title_fullStr Heuristic algorithms for best match graph editing
title_full_unstemmed Heuristic algorithms for best match graph editing
title_short Heuristic algorithms for best match graph editing
title_sort heuristic algorithms for best match graph editing
topic Arc modification problems
Heuristic algorithm
Consistent algorithm
NP-hardness
url https://doi.org/10.1186/s13015-021-00196-3
work_keys_str_mv AT davidschaller heuristicalgorithmsforbestmatchgraphediting
AT manuelageiß heuristicalgorithmsforbestmatchgraphediting
AT marchellmuth heuristicalgorithmsforbestmatchgraphediting
AT peterfstadler heuristicalgorithmsforbestmatchgraphediting