Phylo2Vec: a vector representation for binary trees

<p>Binary phylogenetic trees inferred from biological data are central to understanding the shared history among evolutionary units. However, inferring the placement of latent nodes in a tree is computationally expensive. State-of-the-art methods rely on carefully designed heuristics for tree...

Full description

Bibliographic Details
Main Authors: Penn, MJ, Scheidwasser, N, Khurana, MP, Duchêne, DA, Donnelly, CA, Bhatt, S
Format: Journal article
Language:English
Published: Oxford University Press 2024
_version_ 1826313884161540096
author Penn, MJ
Scheidwasser, N
Khurana, MP
Duchêne, DA
Donnelly, CA
Bhatt, S
author_facet Penn, MJ
Scheidwasser, N
Khurana, MP
Duchêne, DA
Donnelly, CA
Bhatt, S
author_sort Penn, MJ
collection OXFORD
description <p>Binary phylogenetic trees inferred from biological data are central to understanding the shared history among evolutionary units. However, inferring the placement of latent nodes in a tree is computationally expensive. State-of-the-art methods rely on carefully designed heuristics for tree search, using different data structures for easy manipulation (e.g., classes in object-oriented programming languages) and readable representation of trees (e.g., Newick-format strings). Here, we present Phylo2Vec, a parsimonious encoding for phylogenetic trees that serves as a unified approach for both manipulating and representing phylogenetic trees. Phylo2Vec maps any binary tree with&nbsp;<em>n</em>&nbsp;leaves to a unique integer vector of length&nbsp;<em>n &minus;</em>&nbsp;1. The advantages of Phylo2Vec are fourfold: i) fast tree sampling, (ii) compressed tree representation compared to a Newick string, iii) quick and unambiguous verification if two binary trees are identical topologically, and iv) systematic ability to traverse tree space in very large or small jumps. As a proof of concept, we use Phylo2Vec for maximum likelihood inference on five real-world datasets and show that a simple hill-climbing-based optimisation scheme can efficiently traverse the vastness of tree space from a random to an optimal tree.</p>
first_indexed 2024-09-25T04:23:27Z
format Journal article
id oxford-uuid:4f2cb9e6-9607-4fa1-9249-a18d5f655b85
institution University of Oxford
language English
last_indexed 2024-09-25T04:23:27Z
publishDate 2024
publisher Oxford University Press
record_format dspace
spelling oxford-uuid:4f2cb9e6-9607-4fa1-9249-a18d5f655b852024-08-21T08:36:21ZPhylo2Vec: a vector representation for binary treesJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:4f2cb9e6-9607-4fa1-9249-a18d5f655b85EnglishSymplectic ElementsOxford University Press2024Penn, MJScheidwasser, NKhurana, MPDuchêne, DADonnelly, CABhatt, S<p>Binary phylogenetic trees inferred from biological data are central to understanding the shared history among evolutionary units. However, inferring the placement of latent nodes in a tree is computationally expensive. State-of-the-art methods rely on carefully designed heuristics for tree search, using different data structures for easy manipulation (e.g., classes in object-oriented programming languages) and readable representation of trees (e.g., Newick-format strings). Here, we present Phylo2Vec, a parsimonious encoding for phylogenetic trees that serves as a unified approach for both manipulating and representing phylogenetic trees. Phylo2Vec maps any binary tree with&nbsp;<em>n</em>&nbsp;leaves to a unique integer vector of length&nbsp;<em>n &minus;</em>&nbsp;1. The advantages of Phylo2Vec are fourfold: i) fast tree sampling, (ii) compressed tree representation compared to a Newick string, iii) quick and unambiguous verification if two binary trees are identical topologically, and iv) systematic ability to traverse tree space in very large or small jumps. As a proof of concept, we use Phylo2Vec for maximum likelihood inference on five real-world datasets and show that a simple hill-climbing-based optimisation scheme can efficiently traverse the vastness of tree space from a random to an optimal tree.</p>
spellingShingle Penn, MJ
Scheidwasser, N
Khurana, MP
Duchêne, DA
Donnelly, CA
Bhatt, S
Phylo2Vec: a vector representation for binary trees
title Phylo2Vec: a vector representation for binary trees
title_full Phylo2Vec: a vector representation for binary trees
title_fullStr Phylo2Vec: a vector representation for binary trees
title_full_unstemmed Phylo2Vec: a vector representation for binary trees
title_short Phylo2Vec: a vector representation for binary trees
title_sort phylo2vec a vector representation for binary trees
work_keys_str_mv AT pennmj phylo2vecavectorrepresentationforbinarytrees
AT scheidwassern phylo2vecavectorrepresentationforbinarytrees
AT khuranamp phylo2vecavectorrepresentationforbinarytrees
AT ducheneda phylo2vecavectorrepresentationforbinarytrees
AT donnellyca phylo2vecavectorrepresentationforbinarytrees
AT bhatts phylo2vecavectorrepresentationforbinarytrees