Short branch attraction in phylogenomic inference under the multispecies coalescent

Accurate reconstruction of species trees often relies on the quality of input gene trees estimated from molecular sequences. Previous studies suggested that if the sequence length is fixed, the maximum likelihood may produce biased gene trees which subsequently mislead inference of species trees. Tw...

Full description

Bibliographic Details
Main Authors: Liang Liu, Lili Yu, Shaoyuan Wu, Jonathan Arnold, Christopher Whalen, Charles Davis, Scott Edwards
Format: Article
Language:English
Published: Frontiers Media S.A. 2023-06-01
Series:Frontiers in Ecology and Evolution
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fevo.2023.1134764/full
_version_ 1797793370483982336
author Liang Liu
Lili Yu
Shaoyuan Wu
Jonathan Arnold
Christopher Whalen
Charles Davis
Scott Edwards
author_facet Liang Liu
Lili Yu
Shaoyuan Wu
Jonathan Arnold
Christopher Whalen
Charles Davis
Scott Edwards
author_sort Liang Liu
collection DOAJ
description Accurate reconstruction of species trees often relies on the quality of input gene trees estimated from molecular sequences. Previous studies suggested that if the sequence length is fixed, the maximum likelihood may produce biased gene trees which subsequently mislead inference of species trees. Two key questions need to be answered in this context: what are the scenarios that may result in consistently biased gene trees? and for those scenarios, are there any remedies that may remove or at least reduce the misleading effects of consistently biased gene trees? In this article, we establish a theoretical framework to address these questions. Considering a scenario where the true gene tree is a 4-taxon star tree T∗=(S1,S2,S3,S4) with two short branches leading to the species S1 and S2, we demonstrate that maximum likelihood significantly favors the wrong bifurcating tree [(S1, S2), S3, S4] grouping the two species S1 and S2 with short branches. We name this inconsistent behavior short branch attraction, which may occur in real-world data involving a 4-taxon bifurcating gene tree with a short internal branch. If no mutation occurs along the internal branch, which is likely if the internal branch is short, the 4-taxon bifurcating tree is equivalent to the 4-taxon star tree and thus will suffer the same misleading effect of short branch attraction. Theoretical and simulation results further demonstrate that short branch attraction may occur in gene trees and species trees of arbitrary size. Moreover, short branch attraction is primarily caused by a lack of phylogenetic information in sequence data, suggesting that converting short internal branches to polytomies in the estimated gene trees can significantly reduce artifacts induced by short branch attraction.
first_indexed 2024-03-13T02:46:19Z
format Article
id doaj.art-532ee409f71a42fbb1a4ef315300aba6
institution Directory Open Access Journal
issn 2296-701X
language English
last_indexed 2024-03-13T02:46:19Z
publishDate 2023-06-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Ecology and Evolution
spelling doaj.art-532ee409f71a42fbb1a4ef315300aba62023-06-28T15:23:57ZengFrontiers Media S.A.Frontiers in Ecology and Evolution2296-701X2023-06-011110.3389/fevo.2023.11347641134764Short branch attraction in phylogenomic inference under the multispecies coalescentLiang Liu0Lili Yu1Shaoyuan Wu2Jonathan Arnold3Christopher Whalen4Charles Davis5Scott Edwards6Department of Statistics and Institute of Bioinformatics, University of Georgia, Athens, GA, United StatesDepartment of Biostatistics, Georgia Southern University, Statesboro, GA, United StatesJiangsu Key Laboratory of Phylogenomics and Comparative Genomics, Jiangsu International Joint Center of Genomics, School of Life Sciences, Jiangsu Normal University, Xuzhou, Jiangsu, ChinaDepartment of Genetics, University of Georgia, Athens, GA, United StatesDepartment of Epidemiology and Biostatistics, College of Public Health, University of Georgia, Athens, GA, United StatesDepartment of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, United StatesDepartment of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, United StatesAccurate reconstruction of species trees often relies on the quality of input gene trees estimated from molecular sequences. Previous studies suggested that if the sequence length is fixed, the maximum likelihood may produce biased gene trees which subsequently mislead inference of species trees. Two key questions need to be answered in this context: what are the scenarios that may result in consistently biased gene trees? and for those scenarios, are there any remedies that may remove or at least reduce the misleading effects of consistently biased gene trees? In this article, we establish a theoretical framework to address these questions. Considering a scenario where the true gene tree is a 4-taxon star tree T∗=(S1,S2,S3,S4) with two short branches leading to the species S1 and S2, we demonstrate that maximum likelihood significantly favors the wrong bifurcating tree [(S1, S2), S3, S4] grouping the two species S1 and S2 with short branches. We name this inconsistent behavior short branch attraction, which may occur in real-world data involving a 4-taxon bifurcating gene tree with a short internal branch. If no mutation occurs along the internal branch, which is likely if the internal branch is short, the 4-taxon bifurcating tree is equivalent to the 4-taxon star tree and thus will suffer the same misleading effect of short branch attraction. Theoretical and simulation results further demonstrate that short branch attraction may occur in gene trees and species trees of arbitrary size. Moreover, short branch attraction is primarily caused by a lack of phylogenetic information in sequence data, suggesting that converting short internal branches to polytomies in the estimated gene trees can significantly reduce artifacts induced by short branch attraction.https://www.frontiersin.org/articles/10.3389/fevo.2023.1134764/fullcoalescent methodsspecies treesgene treesmultispecies coalescent modellong branch attractionshort branch attraction
spellingShingle Liang Liu
Lili Yu
Shaoyuan Wu
Jonathan Arnold
Christopher Whalen
Charles Davis
Scott Edwards
Short branch attraction in phylogenomic inference under the multispecies coalescent
Frontiers in Ecology and Evolution
coalescent methods
species trees
gene trees
multispecies coalescent model
long branch attraction
short branch attraction
title Short branch attraction in phylogenomic inference under the multispecies coalescent
title_full Short branch attraction in phylogenomic inference under the multispecies coalescent
title_fullStr Short branch attraction in phylogenomic inference under the multispecies coalescent
title_full_unstemmed Short branch attraction in phylogenomic inference under the multispecies coalescent
title_short Short branch attraction in phylogenomic inference under the multispecies coalescent
title_sort short branch attraction in phylogenomic inference under the multispecies coalescent
topic coalescent methods
species trees
gene trees
multispecies coalescent model
long branch attraction
short branch attraction
url https://www.frontiersin.org/articles/10.3389/fevo.2023.1134764/full
work_keys_str_mv AT liangliu shortbranchattractioninphylogenomicinferenceunderthemultispeciescoalescent
AT liliyu shortbranchattractioninphylogenomicinferenceunderthemultispeciescoalescent
AT shaoyuanwu shortbranchattractioninphylogenomicinferenceunderthemultispeciescoalescent
AT jonathanarnold shortbranchattractioninphylogenomicinferenceunderthemultispeciescoalescent
AT christopherwhalen shortbranchattractioninphylogenomicinferenceunderthemultispeciescoalescent
AT charlesdavis shortbranchattractioninphylogenomicinferenceunderthemultispeciescoalescent
AT scottedwards shortbranchattractioninphylogenomicinferenceunderthemultispeciescoalescent