Inter-paralog amino acid inversion events in large phylogenies of duplicated proteins.

Connecting protein sequence to function is becoming increasingly relevant since high-throughput sequencing studies accumulate large amounts of genomic data. In order to go beyond the existing database annotation, it is fundamental to understand the mechanisms underlying functional inheritance and di...

Full description

Bibliographic Details
Main Authors: Stefano Pascarelli, Paola Laurino
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2022-04-01
Series:PLoS Computational Biology
Online Access:https://doi.org/10.1371/journal.pcbi.1010016
_version_ 1818208991880151040
author Stefano Pascarelli
Paola Laurino
author_facet Stefano Pascarelli
Paola Laurino
author_sort Stefano Pascarelli
collection DOAJ
description Connecting protein sequence to function is becoming increasingly relevant since high-throughput sequencing studies accumulate large amounts of genomic data. In order to go beyond the existing database annotation, it is fundamental to understand the mechanisms underlying functional inheritance and divergence. If the homology relationship between proteins is known, can we determine whether the function diverged? In this work, we analyze different possibilities of protein sequence evolution after gene duplication and identify "inter-paralog inversions", i.e., sites where the relationship between the ancestry and the functional signal is decoupled. The amino acids in these sites are masked from being recognized by other prediction tools. Still, they play a role in functional divergence and could indicate a shift in protein function. We develop a method to specifically recognize inter-paralog amino acid inversions in a phylogeny and test it on real and simulated datasets. In a dataset built from the Epidermal Growth Factor Receptor (EGFR) sequences found in 88 fish species, we identify 19 amino acid sites that went through inversion after gene duplication, mostly located at the ligand-binding extracellular domain. Our work uncovers an outcome of protein duplications with direct implications in protein functional annotation and sequence evolution. The developed method is optimized to work with large protein datasets and can be readily included in a targeted protein analysis pipeline.
first_indexed 2024-12-12T04:53:37Z
format Article
id doaj.art-dc06018d96604b7fadf2abdf9efae1aa
institution Directory Open Access Journal
issn 1553-734X
1553-7358
language English
last_indexed 2024-12-12T04:53:37Z
publishDate 2022-04-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS Computational Biology
spelling doaj.art-dc06018d96604b7fadf2abdf9efae1aa2022-12-22T00:37:25ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582022-04-01184e101001610.1371/journal.pcbi.1010016Inter-paralog amino acid inversion events in large phylogenies of duplicated proteins.Stefano PascarelliPaola LaurinoConnecting protein sequence to function is becoming increasingly relevant since high-throughput sequencing studies accumulate large amounts of genomic data. In order to go beyond the existing database annotation, it is fundamental to understand the mechanisms underlying functional inheritance and divergence. If the homology relationship between proteins is known, can we determine whether the function diverged? In this work, we analyze different possibilities of protein sequence evolution after gene duplication and identify "inter-paralog inversions", i.e., sites where the relationship between the ancestry and the functional signal is decoupled. The amino acids in these sites are masked from being recognized by other prediction tools. Still, they play a role in functional divergence and could indicate a shift in protein function. We develop a method to specifically recognize inter-paralog amino acid inversions in a phylogeny and test it on real and simulated datasets. In a dataset built from the Epidermal Growth Factor Receptor (EGFR) sequences found in 88 fish species, we identify 19 amino acid sites that went through inversion after gene duplication, mostly located at the ligand-binding extracellular domain. Our work uncovers an outcome of protein duplications with direct implications in protein functional annotation and sequence evolution. The developed method is optimized to work with large protein datasets and can be readily included in a targeted protein analysis pipeline.https://doi.org/10.1371/journal.pcbi.1010016
spellingShingle Stefano Pascarelli
Paola Laurino
Inter-paralog amino acid inversion events in large phylogenies of duplicated proteins.
PLoS Computational Biology
title Inter-paralog amino acid inversion events in large phylogenies of duplicated proteins.
title_full Inter-paralog amino acid inversion events in large phylogenies of duplicated proteins.
title_fullStr Inter-paralog amino acid inversion events in large phylogenies of duplicated proteins.
title_full_unstemmed Inter-paralog amino acid inversion events in large phylogenies of duplicated proteins.
title_short Inter-paralog amino acid inversion events in large phylogenies of duplicated proteins.
title_sort inter paralog amino acid inversion events in large phylogenies of duplicated proteins
url https://doi.org/10.1371/journal.pcbi.1010016
work_keys_str_mv AT stefanopascarelli interparalogaminoacidinversioneventsinlargephylogeniesofduplicatedproteins
AT paolalaurino interparalogaminoacidinversioneventsinlargephylogeniesofduplicatedproteins