A Peptides Prediction Methodology with Fragments and CNN for Tertiary Structure Based on GRSA2

Proteins are macromolecules essential for living organisms. However, to perform their function, proteins need to achieve their Native Structure (NS). The NS is reached fast in nature. By contrast, in silico, it is obtained by solving the Protein Folding problem (PFP) which currently has a long execu...

Full description

Bibliographic Details
Main Authors: Juan P. Sánchez-Hernández, Juan Frausto-Solís, Diego A. Soto-Monterrubio, Juan J. González-Barbosa, Edgar Roman-Rangel
Format: Article
Language:English
Published: MDPI AG 2022-12-01
Series:Axioms
Subjects:
Online Access:https://www.mdpi.com/2075-1680/11/12/729
Description
Summary:Proteins are macromolecules essential for living organisms. However, to perform their function, proteins need to achieve their Native Structure (NS). The NS is reached fast in nature. By contrast, in silico, it is obtained by solving the Protein Folding problem (PFP) which currently has a long execution time. PFP is computationally an NP-hard problem and is considered one of the biggest current challenges. There are several methods following different strategies for solving PFP. The most successful combine computational methods and biological information: I-TASSER, Rosetta (Robetta server), AlphaFold2 (CASP14 Champion), QUARK, PEP-FOLD3, TopModel, and GRSA2-SSP. The first three named methods obtained the highest quality at CASP events, and all apply the Simulated Annealing or Monte Carlo method, Neural Network, and fragments assembly methodologies. In the present work, we propose the GRSA2-FCNN methodology, which assembles fragments applied to peptides and is based on the GRSA2 and Convolutional Neural Networks (CNN). We compare GRSA2-FCNN with the best state-of-the-art algorithms for PFP, such as I-TASSER, Rosetta, AlphaFold2, QUARK, PEP-FOLD3, TopModel, and GRSA2-SSP. Our methodology is applied to a dataset of 60 peptides and achieves the best performance of all methods tested based on the common metrics TM-score, RMSD, and GDT-TS of the area.
ISSN:2075-1680