ASPEN, a methodology for reconstructing protein evolution with improved accuracy using ensemble models

Evolutionary reconstruction algorithms produce models of the evolutionary history of proteins or species. Such algorithms are highly sensitive to their inputs: the sequences used and their alignments. Here, we asked whether the variance introduced by selecting different input sequences could be used...

Full description

Bibliographic Details
Main Authors:	Roman Sloutsky, Kristen M Naegle
Format:	Article
Language:	English
Published:	eLife Sciences Publications Ltd 2019-10-01
Series:	eLife
Subjects:	protein domains homology trees ensembles
Online Access:	https://elifesciences.org/articles/47676

_version_	1811180558083948544
author	Roman Sloutsky Kristen M Naegle
author_facet	Roman Sloutsky Kristen M Naegle
author_sort	Roman Sloutsky
collection	DOAJ
description	Evolutionary reconstruction algorithms produce models of the evolutionary history of proteins or species. Such algorithms are highly sensitive to their inputs: the sequences used and their alignments. Here, we asked whether the variance introduced by selecting different input sequences could be used to better identify accurate evolutionary models. We subsampled from available ortholog sequences and measured the distribution of observed relationships between paralogs produced across hundreds of models inferred from the subsamples. We observed two important phenomena. First, the reproducibility of an all-sequence, single-alignment reconstruction, measured by comparing topologies inferred from 90% subsamples, directly correlates with the accuracy of that single-alignment reconstruction, producing a measurable value for something that has been traditionally unknowable. Second, topologies that are most consistent with the observations made in the ensemble are more accurate and we present a meta algorithm that exploits this property to improve model accuracy.
first_indexed	2024-04-11T09:05:20Z
format	Article
id	doaj.art-b4913379e0d54a1dbd77f4592d7f83df
institution	Directory Open Access Journal
issn	2050-084X
language	English
last_indexed	2024-04-11T09:05:20Z
publishDate	2019-10-01
publisher	eLife Sciences Publications Ltd
record_format	Article
series	eLife
spelling	doaj.art-b4913379e0d54a1dbd77f4592d7f83df2022-12-22T04:32:40ZengeLife Sciences Publications LtdeLife2050-084X2019-10-01810.7554/eLife.47676ASPEN, a methodology for reconstructing protein evolution with improved accuracy using ensemble modelsRoman Sloutsky0https://orcid.org/0000-0002-0794-1255Kristen M Naegle1https://orcid.org/0000-0001-7146-9592Program in Computational and Systems Biology, Washington University, St. Louis, United States; Department for Biomedical Engineering, Washington University, St. Louis, United States; Department of Biochemistry and Molecular Biology, University of Massachusetts, Amherst, United States; Center for Biological Systems Engineering, Washington University, St. Louis, United StatesDepartment for Biomedical Engineering, Washington University, St. Louis, United States; Center for Biological Systems Engineering, Washington University, St. Louis, United States; Department of Biomedical Engineering, University of Virginia, Charlottesville, United States; Center for Public Health Genomics, University of Virginia, Charlottesville, United StatesEvolutionary reconstruction algorithms produce models of the evolutionary history of proteins or species. Such algorithms are highly sensitive to their inputs: the sequences used and their alignments. Here, we asked whether the variance introduced by selecting different input sequences could be used to better identify accurate evolutionary models. We subsampled from available ortholog sequences and measured the distribution of observed relationships between paralogs produced across hundreds of models inferred from the subsamples. We observed two important phenomena. First, the reproducibility of an all-sequence, single-alignment reconstruction, measured by comparing topologies inferred from 90% subsamples, directly correlates with the accuracy of that single-alignment reconstruction, producing a measurable value for something that has been traditionally unknowable. Second, topologies that are most consistent with the observations made in the ensemble are more accurate and we present a meta algorithm that exploits this property to improve model accuracy.https://elifesciences.org/articles/47676proteindomainshomologytreesensembles
spellingShingle	Roman Sloutsky Kristen M Naegle ASPEN, a methodology for reconstructing protein evolution with improved accuracy using ensemble models eLife protein domains homology trees ensembles
title	ASPEN, a methodology for reconstructing protein evolution with improved accuracy using ensemble models
title_full	ASPEN, a methodology for reconstructing protein evolution with improved accuracy using ensemble models
title_fullStr	ASPEN, a methodology for reconstructing protein evolution with improved accuracy using ensemble models
title_full_unstemmed	ASPEN, a methodology for reconstructing protein evolution with improved accuracy using ensemble models
title_short	ASPEN, a methodology for reconstructing protein evolution with improved accuracy using ensemble models
title_sort	aspen a methodology for reconstructing protein evolution with improved accuracy using ensemble models
topic	protein domains homology trees ensembles
url	https://elifesciences.org/articles/47676
work_keys_str_mv	AT romansloutsky aspenamethodologyforreconstructingproteinevolutionwithimprovedaccuracyusingensemblemodels AT kristenmnaegle aspenamethodologyforreconstructingproteinevolutionwithimprovedaccuracyusingensemblemodels

ASPEN, a methodology for reconstructing protein evolution with improved accuracy using ensemble models

Similar Items