ASPEN, a methodology for reconstructing protein evolution with improved accuracy using ensemble models

Evolutionary reconstruction algorithms produce models of the evolutionary history of proteins or species. Such algorithms are highly sensitive to their inputs: the sequences used and their alignments. Here, we asked whether the variance introduced by selecting different input sequences could be used...

Full description

Bibliographic Details
Main Authors: Roman Sloutsky, Kristen M Naegle
Format: Article
Language:English
Published: eLife Sciences Publications Ltd 2019-10-01
Series:eLife
Subjects:
Online Access:https://elifesciences.org/articles/47676
_version_ 1811180558083948544
author Roman Sloutsky
Kristen M Naegle
author_facet Roman Sloutsky
Kristen M Naegle
author_sort Roman Sloutsky
collection DOAJ
description Evolutionary reconstruction algorithms produce models of the evolutionary history of proteins or species. Such algorithms are highly sensitive to their inputs: the sequences used and their alignments. Here, we asked whether the variance introduced by selecting different input sequences could be used to better identify accurate evolutionary models. We subsampled from available ortholog sequences and measured the distribution of observed relationships between paralogs produced across hundreds of models inferred from the subsamples. We observed two important phenomena. First, the reproducibility of an all-sequence, single-alignment reconstruction, measured by comparing topologies inferred from 90% subsamples, directly correlates with the accuracy of that single-alignment reconstruction, producing a measurable value for something that has been traditionally unknowable. Second, topologies that are most consistent with the observations made in the ensemble are more accurate and we present a meta algorithm that exploits this property to improve model accuracy.
first_indexed 2024-04-11T09:05:20Z
format Article
id doaj.art-b4913379e0d54a1dbd77f4592d7f83df
institution Directory Open Access Journal
issn 2050-084X
language English
last_indexed 2024-04-11T09:05:20Z
publishDate 2019-10-01
publisher eLife Sciences Publications Ltd
record_format Article
series eLife
spelling doaj.art-b4913379e0d54a1dbd77f4592d7f83df2022-12-22T04:32:40ZengeLife Sciences Publications LtdeLife2050-084X2019-10-01810.7554/eLife.47676ASPEN, a methodology for reconstructing protein evolution with improved accuracy using ensemble modelsRoman Sloutsky0https://orcid.org/0000-0002-0794-1255Kristen M Naegle1https://orcid.org/0000-0001-7146-9592Program in Computational and Systems Biology, Washington University, St. Louis, United States; Department for Biomedical Engineering, Washington University, St. Louis, United States; Department of Biochemistry and Molecular Biology, University of Massachusetts, Amherst, United States; Center for Biological Systems Engineering, Washington University, St. Louis, United StatesDepartment for Biomedical Engineering, Washington University, St. Louis, United States; Center for Biological Systems Engineering, Washington University, St. Louis, United States; Department of Biomedical Engineering, University of Virginia, Charlottesville, United States; Center for Public Health Genomics, University of Virginia, Charlottesville, United StatesEvolutionary reconstruction algorithms produce models of the evolutionary history of proteins or species. Such algorithms are highly sensitive to their inputs: the sequences used and their alignments. Here, we asked whether the variance introduced by selecting different input sequences could be used to better identify accurate evolutionary models. We subsampled from available ortholog sequences and measured the distribution of observed relationships between paralogs produced across hundreds of models inferred from the subsamples. We observed two important phenomena. First, the reproducibility of an all-sequence, single-alignment reconstruction, measured by comparing topologies inferred from 90% subsamples, directly correlates with the accuracy of that single-alignment reconstruction, producing a measurable value for something that has been traditionally unknowable. Second, topologies that are most consistent with the observations made in the ensemble are more accurate and we present a meta algorithm that exploits this property to improve model accuracy.https://elifesciences.org/articles/47676proteindomainshomologytreesensembles
spellingShingle Roman Sloutsky
Kristen M Naegle
ASPEN, a methodology for reconstructing protein evolution with improved accuracy using ensemble models
eLife
protein
domains
homology
trees
ensembles
title ASPEN, a methodology for reconstructing protein evolution with improved accuracy using ensemble models
title_full ASPEN, a methodology for reconstructing protein evolution with improved accuracy using ensemble models
title_fullStr ASPEN, a methodology for reconstructing protein evolution with improved accuracy using ensemble models
title_full_unstemmed ASPEN, a methodology for reconstructing protein evolution with improved accuracy using ensemble models
title_short ASPEN, a methodology for reconstructing protein evolution with improved accuracy using ensemble models
title_sort aspen a methodology for reconstructing protein evolution with improved accuracy using ensemble models
topic protein
domains
homology
trees
ensembles
url https://elifesciences.org/articles/47676
work_keys_str_mv AT romansloutsky aspenamethodologyforreconstructingproteinevolutionwithimprovedaccuracyusingensemblemodels
AT kristenmnaegle aspenamethodologyforreconstructingproteinevolutionwithimprovedaccuracyusingensemblemodels