Summary: | Abstract The amino acid sequence of a protein contains all the necessary information to specify its shape, which dictates its biological activities. However, it is challenging and expensive to experimentally determine the three-dimensional structure of proteins. The backbone torsion angles play a critical role in protein structure prediction, and accurately predicting the angles can considerably advance the tertiary structure prediction by accelerating efficient sampling of the large conformational space for low energy structures. Here we first time propose evolutionary signatures computed from protein sequence profiles, and a novel recurrent architecture, termed ESIDEN, that adopts a straightforward architecture of recurrent neural networks with a small number of learnable parameters. The ESIDEN can capture efficient information from both the classic and new features benefiting from different recurrent architectures in processing information. On the other hand, compared to widely used classic features, the new features, especially the Ramachandran basin potential, provide statistical and evolutionary information to improve prediction accuracy. On four widely used benchmark datasets, the ESIDEN significantly improves the accuracy in predicting the torsion angles by comparison to the best-so-far methods. As demonstrated in the present study, the predicted angles can be used as structural constraints to accurately infer protein tertiary structures. Moreover, the proposed features would pave the way to improve machine learning-based methods in protein folding and structure prediction, as well as function prediction. The source code and data are available at the website https://kornmann.bioch.ox.ac.uk/leri/resources/download.html .
|