Protein Structure Classification and Loop Modeling Using Multiple Ramachandran Distributions

Recently, the study of protein structures using angular representations has attracted much attention among structural biologists. The main challenge is how to efficiently model the continuous conformational space of the protein structures based on the differences and similarities between different R...

Full description

Bibliographic Details
Main Authors: Seyed Morteza Najibi, Mehdi Maadooliat, Lan Zhou, Jianhua Z. Huang, Xin Gao
Format: Article
Language:English
Published: Elsevier 2017-01-01
Series:Computational and Structural Biotechnology Journal
Online Access:http://www.sciencedirect.com/science/article/pii/S2001037016300885
_version_ 1828261160222195712
author Seyed Morteza Najibi
Mehdi Maadooliat
Lan Zhou
Jianhua Z. Huang
Xin Gao
author_facet Seyed Morteza Najibi
Mehdi Maadooliat
Lan Zhou
Jianhua Z. Huang
Xin Gao
author_sort Seyed Morteza Najibi
collection DOAJ
description Recently, the study of protein structures using angular representations has attracted much attention among structural biologists. The main challenge is how to efficiently model the continuous conformational space of the protein structures based on the differences and similarities between different Ramachandran plots. Despite the presence of statistical methods for modeling angular data of proteins, there is still a substantial need for more sophisticated and faster statistical tools to model the large-scale circular datasets. To address this need, we have developed a nonparametric method for collective estimation of multiple bivariate density functions for a collection of populations of protein backbone angles. The proposed method takes into account the circular nature of the angular data using trigonometric spline which is more efficient compared to existing methods. This collective density estimation approach is widely applicable when there is a need to estimate multiple density functions from different populations with common features. Moreover, the coefficients of adaptive basis expansion for the fitted densities provide a low-dimensional representation that is useful for visualization, clustering, and classification of the densities. The proposed method provides a novel and unique perspective to two important and challenging problems in protein structure research: structure-based protein classification and angular-sampling-based protein loop structure prediction. Keywords: Bivariate splines, Log-spline density estimation, Protein structure, Ramachandran distribution, Roughness penalty, Trigonometric B-spline, Protein classification, SCOP
first_indexed 2024-04-13T03:34:50Z
format Article
id doaj.art-5b2729e7c06044deafec96080da240a3
institution Directory Open Access Journal
issn 2001-0370
language English
last_indexed 2024-04-13T03:34:50Z
publishDate 2017-01-01
publisher Elsevier
record_format Article
series Computational and Structural Biotechnology Journal
spelling doaj.art-5b2729e7c06044deafec96080da240a32022-12-22T03:04:22ZengElsevierComputational and Structural Biotechnology Journal2001-03702017-01-0115243254Protein Structure Classification and Loop Modeling Using Multiple Ramachandran DistributionsSeyed Morteza Najibi0Mehdi Maadooliat1Lan Zhou2Jianhua Z. Huang3Xin Gao4Department of Statistics, College of Sciences, Shiraz University, Shiraz, IranDepartment of Mathematics, Statistics and Computer Science, Marquette University, WI 53201-1881, USA; Center for Human Genetics, Marshfield Clinic Research Institute, Marshfield, WI 54449, USADepartment of Statistics, Texas A&M University, TX 77843-3143, USADepartment of Statistics, Texas A&M University, TX 77843-3143, USAComputational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia; Corresponding author.Recently, the study of protein structures using angular representations has attracted much attention among structural biologists. The main challenge is how to efficiently model the continuous conformational space of the protein structures based on the differences and similarities between different Ramachandran plots. Despite the presence of statistical methods for modeling angular data of proteins, there is still a substantial need for more sophisticated and faster statistical tools to model the large-scale circular datasets. To address this need, we have developed a nonparametric method for collective estimation of multiple bivariate density functions for a collection of populations of protein backbone angles. The proposed method takes into account the circular nature of the angular data using trigonometric spline which is more efficient compared to existing methods. This collective density estimation approach is widely applicable when there is a need to estimate multiple density functions from different populations with common features. Moreover, the coefficients of adaptive basis expansion for the fitted densities provide a low-dimensional representation that is useful for visualization, clustering, and classification of the densities. The proposed method provides a novel and unique perspective to two important and challenging problems in protein structure research: structure-based protein classification and angular-sampling-based protein loop structure prediction. Keywords: Bivariate splines, Log-spline density estimation, Protein structure, Ramachandran distribution, Roughness penalty, Trigonometric B-spline, Protein classification, SCOPhttp://www.sciencedirect.com/science/article/pii/S2001037016300885
spellingShingle Seyed Morteza Najibi
Mehdi Maadooliat
Lan Zhou
Jianhua Z. Huang
Xin Gao
Protein Structure Classification and Loop Modeling Using Multiple Ramachandran Distributions
Computational and Structural Biotechnology Journal
title Protein Structure Classification and Loop Modeling Using Multiple Ramachandran Distributions
title_full Protein Structure Classification and Loop Modeling Using Multiple Ramachandran Distributions
title_fullStr Protein Structure Classification and Loop Modeling Using Multiple Ramachandran Distributions
title_full_unstemmed Protein Structure Classification and Loop Modeling Using Multiple Ramachandran Distributions
title_short Protein Structure Classification and Loop Modeling Using Multiple Ramachandran Distributions
title_sort protein structure classification and loop modeling using multiple ramachandran distributions
url http://www.sciencedirect.com/science/article/pii/S2001037016300885
work_keys_str_mv AT seyedmortezanajibi proteinstructureclassificationandloopmodelingusingmultipleramachandrandistributions
AT mehdimaadooliat proteinstructureclassificationandloopmodelingusingmultipleramachandrandistributions
AT lanzhou proteinstructureclassificationandloopmodelingusingmultipleramachandrandistributions
AT jianhuazhuang proteinstructureclassificationandloopmodelingusingmultipleramachandrandistributions
AT xingao proteinstructureclassificationandloopmodelingusingmultipleramachandrandistributions