The impact of outgroup choice and missing data on major seed plant phylogenetics using genome-wide EST data.

Genome level analyses have enhanced our view of phylogenetics in many areas of the tree of life. With the production of whole genome DNA sequences of hundreds of organisms and large-scale EST databases a large number of candidate genes for inclusion into phylogenetic analysis have become available....

Full description

Bibliographic Details
Main Authors: Jose Eduardo de la Torre-Bárcena, Sergios-Orestis Kolokotronis, Ernest K Lee, Dennis Wm Stevenson, Eric D Brenner, Manpreet S Katari, Gloria M Coruzzi, Rob DeSalle
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2009-06-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC2685480?pdf=render
_version_ 1818889720602558464
author Jose Eduardo de la Torre-Bárcena
Sergios-Orestis Kolokotronis
Ernest K Lee
Dennis Wm Stevenson
Eric D Brenner
Manpreet S Katari
Gloria M Coruzzi
Rob DeSalle
author_facet Jose Eduardo de la Torre-Bárcena
Sergios-Orestis Kolokotronis
Ernest K Lee
Dennis Wm Stevenson
Eric D Brenner
Manpreet S Katari
Gloria M Coruzzi
Rob DeSalle
author_sort Jose Eduardo de la Torre-Bárcena
collection DOAJ
description Genome level analyses have enhanced our view of phylogenetics in many areas of the tree of life. With the production of whole genome DNA sequences of hundreds of organisms and large-scale EST databases a large number of candidate genes for inclusion into phylogenetic analysis have become available. In this work, we exploit the burgeoning genomic data being generated for plant genomes to address one of the more important plant phylogenetic questions concerning the hierarchical relationships of the several major seed plant lineages (angiosperms, Cycadales, Gingkoales, Gnetales, and Coniferales), which continues to be a work in progress, despite numerous studies using single, few or several genes and morphology datasets. Although most recent studies support the notion that gymnosperms and angiosperms are monophyletic and sister groups, they differ on the topological arrangements within each major group.We exploited the EST database to construct a supermatrix of DNA sequences (over 1,200 concatenated orthologous gene partitions for 17 taxa) to examine non-flowering seed plant relationships. This analysis employed programs that offer rapid and robust orthology determination of novel, short sequences from plant ESTs based on reference seed plant genomes. Our phylogenetic analysis retrieved an unbiased (with respect to gene choice), well-resolved and highly supported phylogenetic hypothesis that was robust to various outgroup combinations.We evaluated character support and the relative contribution of numerous variables (e.g. gene number, missing data, partitioning schemes, taxon sampling and outgroup choice) on tree topology, stability and support metrics. Our results indicate that while missing characters and order of addition of genes to an analysis do not influence branch support, inadequate taxon sampling and limited choice of outgroup(s) can lead to spurious inference of phylogeny when dealing with phylogenomic scale data sets. As expected, support and resolution increases significantly as more informative characters are added, until reaching a threshold, beyond which support metrics stabilize, and the effect of adding conflicting characters is minimized.
first_indexed 2024-12-19T17:13:30Z
format Article
id doaj.art-c869cb6d37fe4806a2f2bc85dfcb99e0
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-12-19T17:13:30Z
publishDate 2009-06-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-c869cb6d37fe4806a2f2bc85dfcb99e02022-12-21T20:12:57ZengPublic Library of Science (PLoS)PLoS ONE1932-62032009-06-0146e576410.1371/journal.pone.0005764The impact of outgroup choice and missing data on major seed plant phylogenetics using genome-wide EST data.Jose Eduardo de la Torre-BárcenaSergios-Orestis KolokotronisErnest K LeeDennis Wm StevensonEric D BrennerManpreet S KatariGloria M CoruzziRob DeSalleGenome level analyses have enhanced our view of phylogenetics in many areas of the tree of life. With the production of whole genome DNA sequences of hundreds of organisms and large-scale EST databases a large number of candidate genes for inclusion into phylogenetic analysis have become available. In this work, we exploit the burgeoning genomic data being generated for plant genomes to address one of the more important plant phylogenetic questions concerning the hierarchical relationships of the several major seed plant lineages (angiosperms, Cycadales, Gingkoales, Gnetales, and Coniferales), which continues to be a work in progress, despite numerous studies using single, few or several genes and morphology datasets. Although most recent studies support the notion that gymnosperms and angiosperms are monophyletic and sister groups, they differ on the topological arrangements within each major group.We exploited the EST database to construct a supermatrix of DNA sequences (over 1,200 concatenated orthologous gene partitions for 17 taxa) to examine non-flowering seed plant relationships. This analysis employed programs that offer rapid and robust orthology determination of novel, short sequences from plant ESTs based on reference seed plant genomes. Our phylogenetic analysis retrieved an unbiased (with respect to gene choice), well-resolved and highly supported phylogenetic hypothesis that was robust to various outgroup combinations.We evaluated character support and the relative contribution of numerous variables (e.g. gene number, missing data, partitioning schemes, taxon sampling and outgroup choice) on tree topology, stability and support metrics. Our results indicate that while missing characters and order of addition of genes to an analysis do not influence branch support, inadequate taxon sampling and limited choice of outgroup(s) can lead to spurious inference of phylogeny when dealing with phylogenomic scale data sets. As expected, support and resolution increases significantly as more informative characters are added, until reaching a threshold, beyond which support metrics stabilize, and the effect of adding conflicting characters is minimized.http://europepmc.org/articles/PMC2685480?pdf=render
spellingShingle Jose Eduardo de la Torre-Bárcena
Sergios-Orestis Kolokotronis
Ernest K Lee
Dennis Wm Stevenson
Eric D Brenner
Manpreet S Katari
Gloria M Coruzzi
Rob DeSalle
The impact of outgroup choice and missing data on major seed plant phylogenetics using genome-wide EST data.
PLoS ONE
title The impact of outgroup choice and missing data on major seed plant phylogenetics using genome-wide EST data.
title_full The impact of outgroup choice and missing data on major seed plant phylogenetics using genome-wide EST data.
title_fullStr The impact of outgroup choice and missing data on major seed plant phylogenetics using genome-wide EST data.
title_full_unstemmed The impact of outgroup choice and missing data on major seed plant phylogenetics using genome-wide EST data.
title_short The impact of outgroup choice and missing data on major seed plant phylogenetics using genome-wide EST data.
title_sort impact of outgroup choice and missing data on major seed plant phylogenetics using genome wide est data
url http://europepmc.org/articles/PMC2685480?pdf=render
work_keys_str_mv AT joseeduardodelatorrebarcena theimpactofoutgroupchoiceandmissingdataonmajorseedplantphylogeneticsusinggenomewideestdata
AT sergiosorestiskolokotronis theimpactofoutgroupchoiceandmissingdataonmajorseedplantphylogeneticsusinggenomewideestdata
AT ernestklee theimpactofoutgroupchoiceandmissingdataonmajorseedplantphylogeneticsusinggenomewideestdata
AT denniswmstevenson theimpactofoutgroupchoiceandmissingdataonmajorseedplantphylogeneticsusinggenomewideestdata
AT ericdbrenner theimpactofoutgroupchoiceandmissingdataonmajorseedplantphylogeneticsusinggenomewideestdata
AT manpreetskatari theimpactofoutgroupchoiceandmissingdataonmajorseedplantphylogeneticsusinggenomewideestdata
AT gloriamcoruzzi theimpactofoutgroupchoiceandmissingdataonmajorseedplantphylogeneticsusinggenomewideestdata
AT robdesalle theimpactofoutgroupchoiceandmissingdataonmajorseedplantphylogeneticsusinggenomewideestdata
AT joseeduardodelatorrebarcena impactofoutgroupchoiceandmissingdataonmajorseedplantphylogeneticsusinggenomewideestdata
AT sergiosorestiskolokotronis impactofoutgroupchoiceandmissingdataonmajorseedplantphylogeneticsusinggenomewideestdata
AT ernestklee impactofoutgroupchoiceandmissingdataonmajorseedplantphylogeneticsusinggenomewideestdata
AT denniswmstevenson impactofoutgroupchoiceandmissingdataonmajorseedplantphylogeneticsusinggenomewideestdata
AT ericdbrenner impactofoutgroupchoiceandmissingdataonmajorseedplantphylogeneticsusinggenomewideestdata
AT manpreetskatari impactofoutgroupchoiceandmissingdataonmajorseedplantphylogeneticsusinggenomewideestdata
AT gloriamcoruzzi impactofoutgroupchoiceandmissingdataonmajorseedplantphylogeneticsusinggenomewideestdata
AT robdesalle impactofoutgroupchoiceandmissingdataonmajorseedplantphylogeneticsusinggenomewideestdata