Heuristic algorithms for best match graph editing
Abstract Background Best match graphs (BMGs) are a class of colored digraphs that naturally appear in mathematical phylogenetics as a representation of the pairwise most closely related genes among multiple species. An arc connects a gene x with a gene y from another species (vertex color) Y wheneve...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2021-08-01
|
Series: | Algorithms for Molecular Biology |
Subjects: | |
Online Access: | https://doi.org/10.1186/s13015-021-00196-3 |
_version_ | 1819139897860030464 |
---|---|
author | David Schaller Manuela Geiß Marc Hellmuth Peter F. Stadler |
author_facet | David Schaller Manuela Geiß Marc Hellmuth Peter F. Stadler |
author_sort | David Schaller |
collection | DOAJ |
description | Abstract Background Best match graphs (BMGs) are a class of colored digraphs that naturally appear in mathematical phylogenetics as a representation of the pairwise most closely related genes among multiple species. An arc connects a gene x with a gene y from another species (vertex color) Y whenever it is one of the phylogenetically closest relatives of x. BMGs can be approximated with the help of similarity measures between gene sequences, albeit not without errors. Empirical estimates thus will usually violate the theoretical properties of BMGs. The corresponding graph editing problem can be used to guide error correction for best match data. Since the arc set modification problems for BMGs are NP-complete, efficient heuristics are needed if BMGs are to be used for the practical analysis of biological sequence data. Results Since BMGs have a characterization in terms of consistency of a certain set of rooted triples (binary trees on three vertices) defined on the set of genes, we consider heuristics that operate on triple sets. As an alternative, we show that there is a close connection to a set partitioning problem that leads to a class of top-down recursive algorithms that are similar to Aho’s supertree algorithm and give rise to BMG editing algorithms that are consistent in the sense that they leave BMGs invariant. Extensive benchmarking shows that community detection algorithms for the partitioning steps perform best for BMG editing. Conclusion Noisy BMG data can be corrected with sufficient accuracy and efficiency to make BMGs an attractive alternative to classical phylogenetic methods. |
first_indexed | 2024-12-22T11:29:58Z |
format | Article |
id | doaj.art-0691ed3cc2fd411eb070e18cd3e69ee2 |
institution | Directory Open Access Journal |
issn | 1748-7188 |
language | English |
last_indexed | 2024-12-22T11:29:58Z |
publishDate | 2021-08-01 |
publisher | BMC |
record_format | Article |
series | Algorithms for Molecular Biology |
spelling | doaj.art-0691ed3cc2fd411eb070e18cd3e69ee22022-12-21T18:27:39ZengBMCAlgorithms for Molecular Biology1748-71882021-08-0116113210.1186/s13015-021-00196-3Heuristic algorithms for best match graph editingDavid Schaller0Manuela Geiß1Marc Hellmuth2Peter F. Stadler3Max Planck Institute for Mathematics in the SciencesSoftware Competence Center Hagenberg GmbHDepartment of Mathematics, Faculty of Science, Stockholm UniversityMax Planck Institute for Mathematics in the SciencesAbstract Background Best match graphs (BMGs) are a class of colored digraphs that naturally appear in mathematical phylogenetics as a representation of the pairwise most closely related genes among multiple species. An arc connects a gene x with a gene y from another species (vertex color) Y whenever it is one of the phylogenetically closest relatives of x. BMGs can be approximated with the help of similarity measures between gene sequences, albeit not without errors. Empirical estimates thus will usually violate the theoretical properties of BMGs. The corresponding graph editing problem can be used to guide error correction for best match data. Since the arc set modification problems for BMGs are NP-complete, efficient heuristics are needed if BMGs are to be used for the practical analysis of biological sequence data. Results Since BMGs have a characterization in terms of consistency of a certain set of rooted triples (binary trees on three vertices) defined on the set of genes, we consider heuristics that operate on triple sets. As an alternative, we show that there is a close connection to a set partitioning problem that leads to a class of top-down recursive algorithms that are similar to Aho’s supertree algorithm and give rise to BMG editing algorithms that are consistent in the sense that they leave BMGs invariant. Extensive benchmarking shows that community detection algorithms for the partitioning steps perform best for BMG editing. Conclusion Noisy BMG data can be corrected with sufficient accuracy and efficiency to make BMGs an attractive alternative to classical phylogenetic methods.https://doi.org/10.1186/s13015-021-00196-3Arc modification problemsHeuristic algorithmConsistent algorithmNP-hardness |
spellingShingle | David Schaller Manuela Geiß Marc Hellmuth Peter F. Stadler Heuristic algorithms for best match graph editing Algorithms for Molecular Biology Arc modification problems Heuristic algorithm Consistent algorithm NP-hardness |
title | Heuristic algorithms for best match graph editing |
title_full | Heuristic algorithms for best match graph editing |
title_fullStr | Heuristic algorithms for best match graph editing |
title_full_unstemmed | Heuristic algorithms for best match graph editing |
title_short | Heuristic algorithms for best match graph editing |
title_sort | heuristic algorithms for best match graph editing |
topic | Arc modification problems Heuristic algorithm Consistent algorithm NP-hardness |
url | https://doi.org/10.1186/s13015-021-00196-3 |
work_keys_str_mv | AT davidschaller heuristicalgorithmsforbestmatchgraphediting AT manuelageiß heuristicalgorithmsforbestmatchgraphediting AT marchellmuth heuristicalgorithmsforbestmatchgraphediting AT peterfstadler heuristicalgorithmsforbestmatchgraphediting |