ASPEN, a methodology for reconstructing protein evolution with improved accuracy using ensemble models
Evolutionary reconstruction algorithms produce models of the evolutionary history of proteins or species. Such algorithms are highly sensitive to their inputs: the sequences used and their alignments. Here, we asked whether the variance introduced by selecting different input sequences could be used...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
eLife Sciences Publications Ltd
2019-10-01
|
Series: | eLife |
Subjects: | |
Online Access: | https://elifesciences.org/articles/47676 |
_version_ | 1811180558083948544 |
---|---|
author | Roman Sloutsky Kristen M Naegle |
author_facet | Roman Sloutsky Kristen M Naegle |
author_sort | Roman Sloutsky |
collection | DOAJ |
description | Evolutionary reconstruction algorithms produce models of the evolutionary history of proteins or species. Such algorithms are highly sensitive to their inputs: the sequences used and their alignments. Here, we asked whether the variance introduced by selecting different input sequences could be used to better identify accurate evolutionary models. We subsampled from available ortholog sequences and measured the distribution of observed relationships between paralogs produced across hundreds of models inferred from the subsamples. We observed two important phenomena. First, the reproducibility of an all-sequence, single-alignment reconstruction, measured by comparing topologies inferred from 90% subsamples, directly correlates with the accuracy of that single-alignment reconstruction, producing a measurable value for something that has been traditionally unknowable. Second, topologies that are most consistent with the observations made in the ensemble are more accurate and we present a meta algorithm that exploits this property to improve model accuracy. |
first_indexed | 2024-04-11T09:05:20Z |
format | Article |
id | doaj.art-b4913379e0d54a1dbd77f4592d7f83df |
institution | Directory Open Access Journal |
issn | 2050-084X |
language | English |
last_indexed | 2024-04-11T09:05:20Z |
publishDate | 2019-10-01 |
publisher | eLife Sciences Publications Ltd |
record_format | Article |
series | eLife |
spelling | doaj.art-b4913379e0d54a1dbd77f4592d7f83df2022-12-22T04:32:40ZengeLife Sciences Publications LtdeLife2050-084X2019-10-01810.7554/eLife.47676ASPEN, a methodology for reconstructing protein evolution with improved accuracy using ensemble modelsRoman Sloutsky0https://orcid.org/0000-0002-0794-1255Kristen M Naegle1https://orcid.org/0000-0001-7146-9592Program in Computational and Systems Biology, Washington University, St. Louis, United States; Department for Biomedical Engineering, Washington University, St. Louis, United States; Department of Biochemistry and Molecular Biology, University of Massachusetts, Amherst, United States; Center for Biological Systems Engineering, Washington University, St. Louis, United StatesDepartment for Biomedical Engineering, Washington University, St. Louis, United States; Center for Biological Systems Engineering, Washington University, St. Louis, United States; Department of Biomedical Engineering, University of Virginia, Charlottesville, United States; Center for Public Health Genomics, University of Virginia, Charlottesville, United StatesEvolutionary reconstruction algorithms produce models of the evolutionary history of proteins or species. Such algorithms are highly sensitive to their inputs: the sequences used and their alignments. Here, we asked whether the variance introduced by selecting different input sequences could be used to better identify accurate evolutionary models. We subsampled from available ortholog sequences and measured the distribution of observed relationships between paralogs produced across hundreds of models inferred from the subsamples. We observed two important phenomena. First, the reproducibility of an all-sequence, single-alignment reconstruction, measured by comparing topologies inferred from 90% subsamples, directly correlates with the accuracy of that single-alignment reconstruction, producing a measurable value for something that has been traditionally unknowable. Second, topologies that are most consistent with the observations made in the ensemble are more accurate and we present a meta algorithm that exploits this property to improve model accuracy.https://elifesciences.org/articles/47676proteindomainshomologytreesensembles |
spellingShingle | Roman Sloutsky Kristen M Naegle ASPEN, a methodology for reconstructing protein evolution with improved accuracy using ensemble models eLife protein domains homology trees ensembles |
title | ASPEN, a methodology for reconstructing protein evolution with improved accuracy using ensemble models |
title_full | ASPEN, a methodology for reconstructing protein evolution with improved accuracy using ensemble models |
title_fullStr | ASPEN, a methodology for reconstructing protein evolution with improved accuracy using ensemble models |
title_full_unstemmed | ASPEN, a methodology for reconstructing protein evolution with improved accuracy using ensemble models |
title_short | ASPEN, a methodology for reconstructing protein evolution with improved accuracy using ensemble models |
title_sort | aspen a methodology for reconstructing protein evolution with improved accuracy using ensemble models |
topic | protein domains homology trees ensembles |
url | https://elifesciences.org/articles/47676 |
work_keys_str_mv | AT romansloutsky aspenamethodologyforreconstructingproteinevolutionwithimprovedaccuracyusingensemblemodels AT kristenmnaegle aspenamethodologyforreconstructingproteinevolutionwithimprovedaccuracyusingensemblemodels |