Protein language models trained on multiple sequence alignments learn phylogenetic relationships

Protein language models taking multiple sequence alignments as inputs capture protein structure and mutational effects. Here, the authors show that these models also encode phylogenetic relationships, and can disentangle correlations due to structural constraints from those due to phylogeny.

Bibliographic Details
Main Authors: Umberto Lupo, Damiano Sgarbossa, Anne-Florence Bitbol
Format: Article
Language:English
Published: Nature Portfolio 2022-10-01
Series:Nature Communications
Online Access:https://doi.org/10.1038/s41467-022-34032-y
_version_ 1797986075097956352
author Umberto Lupo
Damiano Sgarbossa
Anne-Florence Bitbol
author_facet Umberto Lupo
Damiano Sgarbossa
Anne-Florence Bitbol
author_sort Umberto Lupo
collection DOAJ
description Protein language models taking multiple sequence alignments as inputs capture protein structure and mutational effects. Here, the authors show that these models also encode phylogenetic relationships, and can disentangle correlations due to structural constraints from those due to phylogeny.
first_indexed 2024-04-11T07:28:19Z
format Article
id doaj.art-0db1804a936447da86e928752078bb1b
institution Directory Open Access Journal
issn 2041-1723
language English
last_indexed 2024-04-11T07:28:19Z
publishDate 2022-10-01
publisher Nature Portfolio
record_format Article
series Nature Communications
spelling doaj.art-0db1804a936447da86e928752078bb1b2022-12-22T04:37:01ZengNature PortfolioNature Communications2041-17232022-10-0113111110.1038/s41467-022-34032-yProtein language models trained on multiple sequence alignments learn phylogenetic relationshipsUmberto Lupo0Damiano Sgarbossa1Anne-Florence Bitbol2Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL)Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL)Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL)Protein language models taking multiple sequence alignments as inputs capture protein structure and mutational effects. Here, the authors show that these models also encode phylogenetic relationships, and can disentangle correlations due to structural constraints from those due to phylogeny.https://doi.org/10.1038/s41467-022-34032-y
spellingShingle Umberto Lupo
Damiano Sgarbossa
Anne-Florence Bitbol
Protein language models trained on multiple sequence alignments learn phylogenetic relationships
Nature Communications
title Protein language models trained on multiple sequence alignments learn phylogenetic relationships
title_full Protein language models trained on multiple sequence alignments learn phylogenetic relationships
title_fullStr Protein language models trained on multiple sequence alignments learn phylogenetic relationships
title_full_unstemmed Protein language models trained on multiple sequence alignments learn phylogenetic relationships
title_short Protein language models trained on multiple sequence alignments learn phylogenetic relationships
title_sort protein language models trained on multiple sequence alignments learn phylogenetic relationships
url https://doi.org/10.1038/s41467-022-34032-y
work_keys_str_mv AT umbertolupo proteinlanguagemodelstrainedonmultiplesequencealignmentslearnphylogeneticrelationships
AT damianosgarbossa proteinlanguagemodelstrainedonmultiplesequencealignmentslearnphylogeneticrelationships
AT anneflorencebitbol proteinlanguagemodelstrainedonmultiplesequencealignmentslearnphylogeneticrelationships