A supertree pipeline for summarizing phylogenetic and taxonomic information for millions of species

We present a new supertree method that enables rapid estimation of a summary tree on the scale of millions of leaves. This supertree method summarizes a collection of input phylogenies and an input taxonomy. We introduce formal goals and criteria for such a supertree to satisfy in order to transpare...

Full description

Bibliographic Details
Main Authors: Benjamin D. Redelings, Mark T. Holder
Format: Article
Language:English
Published: PeerJ Inc. 2017-03-01
Series:PeerJ
Subjects:
Online Access:https://peerj.com/articles/3058.pdf
_version_ 1797418972516188160
author Benjamin D. Redelings
Mark T. Holder
author_facet Benjamin D. Redelings
Mark T. Holder
author_sort Benjamin D. Redelings
collection DOAJ
description We present a new supertree method that enables rapid estimation of a summary tree on the scale of millions of leaves. This supertree method summarizes a collection of input phylogenies and an input taxonomy. We introduce formal goals and criteria for such a supertree to satisfy in order to transparently and justifiably represent the input trees. In addition to producing a supertree, our method computes annotations that describe which grouping in the input trees support and conflict with each group in the supertree. We compare our supertree construction method to a previously published supertree construction method by assessing their performance on input trees used to construct the Open Tree of Life version 4, and find that our method increases the number of displayed input splits from 35,518 to 39,639 and decreases the number of conflicting input splits from 2,760 to 1,357. The new supertree method also improves on the previous supertree construction method in that it produces no unsupported branches and avoids unnecessary polytomies. This pipeline is currently used by the Open Tree of Life project to produce all of the versions of project’s “synthetic tree” starting at version 5. This software pipeline is called “propinquity”. It relies heavily on “otcetera”—a set of C++ tools to perform most of the steps of the pipeline. All of the components are free software and are available on GitHub.
first_indexed 2024-03-09T06:40:43Z
format Article
id doaj.art-dc102ef28b074cefbc89e4c119a6ae57
institution Directory Open Access Journal
issn 2167-8359
language English
last_indexed 2024-03-09T06:40:43Z
publishDate 2017-03-01
publisher PeerJ Inc.
record_format Article
series PeerJ
spelling doaj.art-dc102ef28b074cefbc89e4c119a6ae572023-12-03T10:51:37ZengPeerJ Inc.PeerJ2167-83592017-03-015e305810.7717/peerj.3058A supertree pipeline for summarizing phylogenetic and taxonomic information for millions of speciesBenjamin D. Redelings0Mark T. Holder1Department of Biology, Duke University, Durham, NC, United StatesDepartment of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS, United StatesWe present a new supertree method that enables rapid estimation of a summary tree on the scale of millions of leaves. This supertree method summarizes a collection of input phylogenies and an input taxonomy. We introduce formal goals and criteria for such a supertree to satisfy in order to transparently and justifiably represent the input trees. In addition to producing a supertree, our method computes annotations that describe which grouping in the input trees support and conflict with each group in the supertree. We compare our supertree construction method to a previously published supertree construction method by assessing their performance on input trees used to construct the Open Tree of Life version 4, and find that our method increases the number of displayed input splits from 35,518 to 39,639 and decreases the number of conflicting input splits from 2,760 to 1,357. The new supertree method also improves on the previous supertree construction method in that it produces no unsupported branches and avoids unnecessary polytomies. This pipeline is currently used by the Open Tree of Life project to produce all of the versions of project’s “synthetic tree” starting at version 5. This software pipeline is called “propinquity”. It relies heavily on “otcetera”—a set of C++ tools to perform most of the steps of the pipeline. All of the components are free software and are available on GitHub.https://peerj.com/articles/3058.pdfSupertreePhylogeneticsTaxonomySoftwareTree of life
spellingShingle Benjamin D. Redelings
Mark T. Holder
A supertree pipeline for summarizing phylogenetic and taxonomic information for millions of species
PeerJ
Supertree
Phylogenetics
Taxonomy
Software
Tree of life
title A supertree pipeline for summarizing phylogenetic and taxonomic information for millions of species
title_full A supertree pipeline for summarizing phylogenetic and taxonomic information for millions of species
title_fullStr A supertree pipeline for summarizing phylogenetic and taxonomic information for millions of species
title_full_unstemmed A supertree pipeline for summarizing phylogenetic and taxonomic information for millions of species
title_short A supertree pipeline for summarizing phylogenetic and taxonomic information for millions of species
title_sort supertree pipeline for summarizing phylogenetic and taxonomic information for millions of species
topic Supertree
Phylogenetics
Taxonomy
Software
Tree of life
url https://peerj.com/articles/3058.pdf
work_keys_str_mv AT benjamindredelings asupertreepipelineforsummarizingphylogeneticandtaxonomicinformationformillionsofspecies
AT marktholder asupertreepipelineforsummarizingphylogeneticandtaxonomicinformationformillionsofspecies
AT benjamindredelings supertreepipelineforsummarizingphylogeneticandtaxonomicinformationformillionsofspecies
AT marktholder supertreepipelineforsummarizingphylogeneticandtaxonomicinformationformillionsofspecies