Proteome-Wide Structural Computations Provide Insights into Empirical Amino Acid Substitution Matrices

The relative contribution of mutation and selection to the amino acid substitution rates observed in empirical matrices is unclear. Herein, we present a neutral continuous fitness-stability model, inspired by the Arrhenius law (<inline-formula><math xmlns="http://www.w3.org/1998/Math/M...

Full description

Bibliographic Details
Main Authors: Pablo Aledo, Juan Carlos Aledo
Format: Article
Language:English
Published: MDPI AG 2023-01-01
Series:International Journal of Molecular Sciences
Subjects:
Online Access:https://www.mdpi.com/1422-0067/24/1/796
_version_ 1797625541567709184
author Pablo Aledo
Juan Carlos Aledo
author_facet Pablo Aledo
Juan Carlos Aledo
author_sort Pablo Aledo
collection DOAJ
description The relative contribution of mutation and selection to the amino acid substitution rates observed in empirical matrices is unclear. Herein, we present a neutral continuous fitness-stability model, inspired by the Arrhenius law (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi>q</mi><mrow><mi>i</mi><mi>j</mi></mrow></msub><mo>=</mo><msub><mi>a</mi><mrow><mi>i</mi><mi>j</mi></mrow></msub><msup><mi>e</mi><mrow><mo>−</mo><mfenced close="|" open="|"><mrow><mo>Δ</mo><mo>Δ</mo><msub><mi>G</mi><mrow><mi>i</mi><mi>j</mi></mrow></msub></mrow></mfenced></mrow></msup></mrow></semantics></math></inline-formula>). The model postulates that the rate of amino acid substitution (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>i</mi><mo>→</mo><mi>j</mi></mrow></semantics></math></inline-formula>) is determined by the product of a pre-exponential factor, which is influenced by the genetic code structure, and an exponential term reflecting the relative fitness of the amino acid substitutions. To assess the validity of our model, we computed changes in stability of 14,094 proteins, for which 137,073,638 in silico mutants were analyzed. These site-specific data were summarized into a 20 square matrix, whose entries, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mfenced close="|" open="|"><mrow><mo>Δ</mo><mo>Δ</mo><msub><mi>G</mi><mrow><mi>i</mi><mi>j</mi></mrow></msub></mrow></mfenced></mrow></semantics></math></inline-formula>, were obtained after averaging through all the sites in all the proteins. We found a significant positive correlation between these energy values and the disease-causing potential of each substitution, suggesting that the exponential term accurately summarizes the fitness effect. A remarkable observation was that amino acids that were highly destabilizing when acting as the source, tended to have little effect when acting as the destination, and vice versa (source <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mo>→</mo></semantics></math></inline-formula> destination). The Arrhenius model accurately reproduced the pattern of substitution rates collected in the empirical matrices, suggesting a relevant role for the genetic code structure and a tuning role for purifying selection exerted via protein stability.
first_indexed 2024-03-11T09:57:50Z
format Article
id doaj.art-5ef76ec2aea4452cae9b9b0b624f776c
institution Directory Open Access Journal
issn 1661-6596
1422-0067
language English
last_indexed 2024-03-11T09:57:50Z
publishDate 2023-01-01
publisher MDPI AG
record_format Article
series International Journal of Molecular Sciences
spelling doaj.art-5ef76ec2aea4452cae9b9b0b624f776c2023-11-16T15:38:43ZengMDPI AGInternational Journal of Molecular Sciences1661-65961422-00672023-01-0124179610.3390/ijms24010796Proteome-Wide Structural Computations Provide Insights into Empirical Amino Acid Substitution MatricesPablo Aledo0Juan Carlos Aledo1Department of Molecular Biology and Biochemistry, University of Málaga, 29071 Málaga, SpainDepartment of Molecular Biology and Biochemistry, University of Málaga, 29071 Málaga, SpainThe relative contribution of mutation and selection to the amino acid substitution rates observed in empirical matrices is unclear. Herein, we present a neutral continuous fitness-stability model, inspired by the Arrhenius law (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi>q</mi><mrow><mi>i</mi><mi>j</mi></mrow></msub><mo>=</mo><msub><mi>a</mi><mrow><mi>i</mi><mi>j</mi></mrow></msub><msup><mi>e</mi><mrow><mo>−</mo><mfenced close="|" open="|"><mrow><mo>Δ</mo><mo>Δ</mo><msub><mi>G</mi><mrow><mi>i</mi><mi>j</mi></mrow></msub></mrow></mfenced></mrow></msup></mrow></semantics></math></inline-formula>). The model postulates that the rate of amino acid substitution (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>i</mi><mo>→</mo><mi>j</mi></mrow></semantics></math></inline-formula>) is determined by the product of a pre-exponential factor, which is influenced by the genetic code structure, and an exponential term reflecting the relative fitness of the amino acid substitutions. To assess the validity of our model, we computed changes in stability of 14,094 proteins, for which 137,073,638 in silico mutants were analyzed. These site-specific data were summarized into a 20 square matrix, whose entries, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mfenced close="|" open="|"><mrow><mo>Δ</mo><mo>Δ</mo><msub><mi>G</mi><mrow><mi>i</mi><mi>j</mi></mrow></msub></mrow></mfenced></mrow></semantics></math></inline-formula>, were obtained after averaging through all the sites in all the proteins. We found a significant positive correlation between these energy values and the disease-causing potential of each substitution, suggesting that the exponential term accurately summarizes the fitness effect. A remarkable observation was that amino acids that were highly destabilizing when acting as the source, tended to have little effect when acting as the destination, and vice versa (source <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mo>→</mo></semantics></math></inline-formula> destination). The Arrhenius model accurately reproduced the pattern of substitution rates collected in the empirical matrices, suggesting a relevant role for the genetic code structure and a tuning role for purifying selection exerted via protein stability.https://www.mdpi.com/1422-0067/24/1/796amino acid substitutionfitnessgenetic codemutationprotein evolutionprotein stability
spellingShingle Pablo Aledo
Juan Carlos Aledo
Proteome-Wide Structural Computations Provide Insights into Empirical Amino Acid Substitution Matrices
International Journal of Molecular Sciences
amino acid substitution
fitness
genetic code
mutation
protein evolution
protein stability
title Proteome-Wide Structural Computations Provide Insights into Empirical Amino Acid Substitution Matrices
title_full Proteome-Wide Structural Computations Provide Insights into Empirical Amino Acid Substitution Matrices
title_fullStr Proteome-Wide Structural Computations Provide Insights into Empirical Amino Acid Substitution Matrices
title_full_unstemmed Proteome-Wide Structural Computations Provide Insights into Empirical Amino Acid Substitution Matrices
title_short Proteome-Wide Structural Computations Provide Insights into Empirical Amino Acid Substitution Matrices
title_sort proteome wide structural computations provide insights into empirical amino acid substitution matrices
topic amino acid substitution
fitness
genetic code
mutation
protein evolution
protein stability
url https://www.mdpi.com/1422-0067/24/1/796
work_keys_str_mv AT pabloaledo proteomewidestructuralcomputationsprovideinsightsintoempiricalaminoacidsubstitutionmatrices
AT juancarlosaledo proteomewidestructuralcomputationsprovideinsightsintoempiricalaminoacidsubstitutionmatrices