Phylogenetic Gaussian process model for the inference of functionally important regions in protein tertiary structures.

A critical question in biology is the identification of functionally important amino acid sites in proteins. Because functionally important sites are under stronger purifying selection, site-specific substitution rates tend to be lower than usual at these sites. A large number of phylogenetic models...

Full description

Bibliographic Details
Main Authors: Yi-Fei Huang, G Brian Golding
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2014-01-01
Series:PLoS Computational Biology
Online Access:http://europepmc.org/articles/PMC3894161?pdf=render
_version_ 1818201046297608192
author Yi-Fei Huang
G Brian Golding
author_facet Yi-Fei Huang
G Brian Golding
author_sort Yi-Fei Huang
collection DOAJ
description A critical question in biology is the identification of functionally important amino acid sites in proteins. Because functionally important sites are under stronger purifying selection, site-specific substitution rates tend to be lower than usual at these sites. A large number of phylogenetic models have been developed to estimate site-specific substitution rates in proteins and the extraordinarily low substitution rates have been used as evidence of function. Most of the existing tools, e.g. Rate4Site, assume that site-specific substitution rates are independent across sites. However, site-specific substitution rates may be strongly correlated in the protein tertiary structure, since functionally important sites tend to be clustered together to form functional patches. We have developed a new model, GP4Rate, which incorporates the Gaussian process model with the standard phylogenetic model to identify slowly evolved regions in protein tertiary structures. GP4Rate uses the Gaussian process to define a nonparametric prior distribution of site-specific substitution rates, which naturally captures the spatial correlation of substitution rates. Simulations suggest that GP4Rate can potentially estimate site-specific substitution rates with a much higher accuracy than Rate4Site and tends to report slowly evolved regions rather than individual sites. In addition, GP4Rate can estimate the strength of the spatial correlation of substitution rates from the data. By applying GP4Rate to a set of mammalian B7-1 genes, we found a highly conserved region which coincides with experimental evidence. GP4Rate may be a useful tool for the in silico prediction of functionally important regions in the proteins with known structures.
first_indexed 2024-12-12T02:47:19Z
format Article
id doaj.art-2ebe45fd1788449c9750a5c4386e90ad
institution Directory Open Access Journal
issn 1553-734X
1553-7358
language English
last_indexed 2024-12-12T02:47:19Z
publishDate 2014-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS Computational Biology
spelling doaj.art-2ebe45fd1788449c9750a5c4386e90ad2022-12-22T00:40:59ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582014-01-01101e100342910.1371/journal.pcbi.1003429Phylogenetic Gaussian process model for the inference of functionally important regions in protein tertiary structures.Yi-Fei HuangG Brian GoldingA critical question in biology is the identification of functionally important amino acid sites in proteins. Because functionally important sites are under stronger purifying selection, site-specific substitution rates tend to be lower than usual at these sites. A large number of phylogenetic models have been developed to estimate site-specific substitution rates in proteins and the extraordinarily low substitution rates have been used as evidence of function. Most of the existing tools, e.g. Rate4Site, assume that site-specific substitution rates are independent across sites. However, site-specific substitution rates may be strongly correlated in the protein tertiary structure, since functionally important sites tend to be clustered together to form functional patches. We have developed a new model, GP4Rate, which incorporates the Gaussian process model with the standard phylogenetic model to identify slowly evolved regions in protein tertiary structures. GP4Rate uses the Gaussian process to define a nonparametric prior distribution of site-specific substitution rates, which naturally captures the spatial correlation of substitution rates. Simulations suggest that GP4Rate can potentially estimate site-specific substitution rates with a much higher accuracy than Rate4Site and tends to report slowly evolved regions rather than individual sites. In addition, GP4Rate can estimate the strength of the spatial correlation of substitution rates from the data. By applying GP4Rate to a set of mammalian B7-1 genes, we found a highly conserved region which coincides with experimental evidence. GP4Rate may be a useful tool for the in silico prediction of functionally important regions in the proteins with known structures.http://europepmc.org/articles/PMC3894161?pdf=render
spellingShingle Yi-Fei Huang
G Brian Golding
Phylogenetic Gaussian process model for the inference of functionally important regions in protein tertiary structures.
PLoS Computational Biology
title Phylogenetic Gaussian process model for the inference of functionally important regions in protein tertiary structures.
title_full Phylogenetic Gaussian process model for the inference of functionally important regions in protein tertiary structures.
title_fullStr Phylogenetic Gaussian process model for the inference of functionally important regions in protein tertiary structures.
title_full_unstemmed Phylogenetic Gaussian process model for the inference of functionally important regions in protein tertiary structures.
title_short Phylogenetic Gaussian process model for the inference of functionally important regions in protein tertiary structures.
title_sort phylogenetic gaussian process model for the inference of functionally important regions in protein tertiary structures
url http://europepmc.org/articles/PMC3894161?pdf=render
work_keys_str_mv AT yifeihuang phylogeneticgaussianprocessmodelfortheinferenceoffunctionallyimportantregionsinproteintertiarystructures
AT gbriangolding phylogeneticgaussianprocessmodelfortheinferenceoffunctionallyimportantregionsinproteintertiarystructures