The intrinsic dimension of protein sequence evolution.

It is well known that, in order to preserve its structure and function, a protein cannot change its sequence at random, but only by mutations occurring preferentially at specific locations. We here investigate quantitatively the amount of variability that is allowed in protein sequence evolution, by...

Full description

Bibliographic Details
Main Authors: Elena Facco, Andrea Pagnani, Elena Tea Russo, Alessandro Laio
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2019-04-01
Series:PLoS Computational Biology
Online Access:http://europepmc.org/articles/PMC6472826?pdf=render
_version_ 1818313293169688576
author Elena Facco
Andrea Pagnani
Elena Tea Russo
Alessandro Laio
author_facet Elena Facco
Andrea Pagnani
Elena Tea Russo
Alessandro Laio
author_sort Elena Facco
collection DOAJ
description It is well known that, in order to preserve its structure and function, a protein cannot change its sequence at random, but only by mutations occurring preferentially at specific locations. We here investigate quantitatively the amount of variability that is allowed in protein sequence evolution, by computing the intrinsic dimension (ID) of the sequences belonging to a selection of protein families. The ID is a measure of the number of independent directions that evolution can take starting from a given sequence. We find that the ID is practically constant for sequences belonging to the same family, and moreover it is very similar in different families, with values ranging between 6 and 12. These values are significantly smaller than the raw number of amino acids, confirming the importance of correlations between mutations in different sites. However, we demonstrate that correlations are not sufficient to explain the small value of the ID we observe in protein families. Indeed, we show that the ID of a set of protein sequences generated by maximum entropy models, an approach in which correlations are accounted for, is typically significantly larger than the value observed in natural protein families. We further prove that a critical factor to reproduce the natural ID is to take into consideration the phylogeny of sequences.
first_indexed 2024-12-13T08:31:26Z
format Article
id doaj.art-6d7d70e742734e51af10aaa23ef3e660
institution Directory Open Access Journal
issn 1553-734X
1553-7358
language English
last_indexed 2024-12-13T08:31:26Z
publishDate 2019-04-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS Computational Biology
spelling doaj.art-6d7d70e742734e51af10aaa23ef3e6602022-12-21T23:53:45ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582019-04-01154e100676710.1371/journal.pcbi.1006767The intrinsic dimension of protein sequence evolution.Elena FaccoAndrea PagnaniElena Tea RussoAlessandro LaioIt is well known that, in order to preserve its structure and function, a protein cannot change its sequence at random, but only by mutations occurring preferentially at specific locations. We here investigate quantitatively the amount of variability that is allowed in protein sequence evolution, by computing the intrinsic dimension (ID) of the sequences belonging to a selection of protein families. The ID is a measure of the number of independent directions that evolution can take starting from a given sequence. We find that the ID is practically constant for sequences belonging to the same family, and moreover it is very similar in different families, with values ranging between 6 and 12. These values are significantly smaller than the raw number of amino acids, confirming the importance of correlations between mutations in different sites. However, we demonstrate that correlations are not sufficient to explain the small value of the ID we observe in protein families. Indeed, we show that the ID of a set of protein sequences generated by maximum entropy models, an approach in which correlations are accounted for, is typically significantly larger than the value observed in natural protein families. We further prove that a critical factor to reproduce the natural ID is to take into consideration the phylogeny of sequences.http://europepmc.org/articles/PMC6472826?pdf=render
spellingShingle Elena Facco
Andrea Pagnani
Elena Tea Russo
Alessandro Laio
The intrinsic dimension of protein sequence evolution.
PLoS Computational Biology
title The intrinsic dimension of protein sequence evolution.
title_full The intrinsic dimension of protein sequence evolution.
title_fullStr The intrinsic dimension of protein sequence evolution.
title_full_unstemmed The intrinsic dimension of protein sequence evolution.
title_short The intrinsic dimension of protein sequence evolution.
title_sort intrinsic dimension of protein sequence evolution
url http://europepmc.org/articles/PMC6472826?pdf=render
work_keys_str_mv AT elenafacco theintrinsicdimensionofproteinsequenceevolution
AT andreapagnani theintrinsicdimensionofproteinsequenceevolution
AT elenatearusso theintrinsicdimensionofproteinsequenceevolution
AT alessandrolaio theintrinsicdimensionofproteinsequenceevolution
AT elenafacco intrinsicdimensionofproteinsequenceevolution
AT andreapagnani intrinsicdimensionofproteinsequenceevolution
AT elenatearusso intrinsicdimensionofproteinsequenceevolution
AT alessandrolaio intrinsicdimensionofproteinsequenceevolution