Protein language models trained on multiple sequence alignments learn phylogenetic relationships
Protein language models taking multiple sequence alignments as inputs capture protein structure and mutational effects. Here, the authors show that these models also encode phylogenetic relationships, and can disentangle correlations due to structural constraints from those due to phylogeny.
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2022-10-01
|
Series: | Nature Communications |
Online Access: | https://doi.org/10.1038/s41467-022-34032-y |
_version_ | 1797986075097956352 |
---|---|
author | Umberto Lupo Damiano Sgarbossa Anne-Florence Bitbol |
author_facet | Umberto Lupo Damiano Sgarbossa Anne-Florence Bitbol |
author_sort | Umberto Lupo |
collection | DOAJ |
description | Protein language models taking multiple sequence alignments as inputs capture protein structure and mutational effects. Here, the authors show that these models also encode phylogenetic relationships, and can disentangle correlations due to structural constraints from those due to phylogeny. |
first_indexed | 2024-04-11T07:28:19Z |
format | Article |
id | doaj.art-0db1804a936447da86e928752078bb1b |
institution | Directory Open Access Journal |
issn | 2041-1723 |
language | English |
last_indexed | 2024-04-11T07:28:19Z |
publishDate | 2022-10-01 |
publisher | Nature Portfolio |
record_format | Article |
series | Nature Communications |
spelling | doaj.art-0db1804a936447da86e928752078bb1b2022-12-22T04:37:01ZengNature PortfolioNature Communications2041-17232022-10-0113111110.1038/s41467-022-34032-yProtein language models trained on multiple sequence alignments learn phylogenetic relationshipsUmberto Lupo0Damiano Sgarbossa1Anne-Florence Bitbol2Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL)Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL)Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL)Protein language models taking multiple sequence alignments as inputs capture protein structure and mutational effects. Here, the authors show that these models also encode phylogenetic relationships, and can disentangle correlations due to structural constraints from those due to phylogeny.https://doi.org/10.1038/s41467-022-34032-y |
spellingShingle | Umberto Lupo Damiano Sgarbossa Anne-Florence Bitbol Protein language models trained on multiple sequence alignments learn phylogenetic relationships Nature Communications |
title | Protein language models trained on multiple sequence alignments learn phylogenetic relationships |
title_full | Protein language models trained on multiple sequence alignments learn phylogenetic relationships |
title_fullStr | Protein language models trained on multiple sequence alignments learn phylogenetic relationships |
title_full_unstemmed | Protein language models trained on multiple sequence alignments learn phylogenetic relationships |
title_short | Protein language models trained on multiple sequence alignments learn phylogenetic relationships |
title_sort | protein language models trained on multiple sequence alignments learn phylogenetic relationships |
url | https://doi.org/10.1038/s41467-022-34032-y |
work_keys_str_mv | AT umbertolupo proteinlanguagemodelstrainedonmultiplesequencealignmentslearnphylogeneticrelationships AT damianosgarbossa proteinlanguagemodelstrainedonmultiplesequencealignmentslearnphylogeneticrelationships AT anneflorencebitbol proteinlanguagemodelstrainedonmultiplesequencealignmentslearnphylogeneticrelationships |