Protein language models trained on multiple sequence alignments learn phylogenetic relationships

Protein language models taking multiple sequence alignments as inputs capture protein structure and mutational effects. Here, the authors show that these models also encode phylogenetic relationships, and can disentangle correlations due to structural constraints from those due to phylogeny.

Bibliographic Details
Main Authors: Umberto Lupo, Damiano Sgarbossa, Anne-Florence Bitbol
Format: Article
Language:English
Published: Nature Portfolio 2022-10-01
Series:Nature Communications
Online Access:https://doi.org/10.1038/s41467-022-34032-y
Description
Summary:Protein language models taking multiple sequence alignments as inputs capture protein structure and mutational effects. Here, the authors show that these models also encode phylogenetic relationships, and can disentangle correlations due to structural constraints from those due to phylogeny.
ISSN:2041-1723