Protein secondary structure prediction using a small training set (compact model) combined with a Complex-valued neural network approach

Background: Protein secondary structure prediction (SSP) has been an area of intense research interest. Despite advances in recent methods conducted on large datasets, the estimated upper limit accuracy is yet to be reached. Since the predictions of SSP methods are applied as input to higher-level s...

Full description

Bibliographic Details
Main Authors:	Rashid, Shamima, Saraswathi, Saras, Kloczkowski, Andrzej, Sundaram, Suresh, Kolinski, Andrzej
Other Authors:	School of Computer Engineering
Format:	Journal Article
Language:	English
Published:	2016
Subjects:	Heuristics Secondary structure prediction
Online Access:	https://hdl.handle.net/10356/84028 http://hdl.handle.net/10220/41590

_version_	1811696025623068672
author	Rashid, Shamima Saraswathi, Saras Kloczkowski, Andrzej Sundaram, Suresh Kolinski, Andrzej
author2	School of Computer Engineering
author_facet	School of Computer Engineering Rashid, Shamima Saraswathi, Saras Kloczkowski, Andrzej Sundaram, Suresh Kolinski, Andrzej
author_sort	Rashid, Shamima
collection	NTU
description	Background: Protein secondary structure prediction (SSP) has been an area of intense research interest. Despite advances in recent methods conducted on large datasets, the estimated upper limit accuracy is yet to be reached. Since the predictions of SSP methods are applied as input to higher-level structure prediction pipelines, even small errors may have large perturbations in final models. Previous works relied on cross validation as an estimate of classifier accuracy. However, training on large numbers of protein chains compromises the classifier ability to generalize to new sequences. This prompts a novel approach to training and an investigation into the possible structural factors that lead to poor predictions. Here, a small group of 55 proteins termed the compact model is selected from the CB513 dataset using a heuristics-based approach. In a prior work, all sequences were represented as probability matrices of residues adopting each of Helix, Sheet and Coil states, based on energy calculations using the C-Alpha, C-Beta, Side-chain (CABS) algorithm. The functional relationship between the conformational energies computed with CABS force-field and residue states is approximated using a classifier termed the Fully Complex-valued Relaxation Network (FCRN). The FCRN is trained with the compact model proteins. Results: The performance of the compact model is compared with traditional cross-validated accuracies and blind-tested on a dataset of G Switch proteins, obtaining accuracies of ∼81 %. The model demonstrates better results when compared to several techniques in the literature. A comparative case study of the worst performing chain identifies hydrogen bond contacts that lead to Coil ⇔ Sheet misclassifications. Overall, mispredicted Coil residues have a higher propensity to participate in backbone hydrogen bonding than correctly predicted Coils. Conclusions: The implications of these findings are: (i) the choice of training proteins is important in preserving the generalization of a classifier to predict new sequences accurately and (ii) SSP techniques sensitive in distinguishing between backbone hydrogen bonding and side-chain or water-mediated hydrogen bonding might be needed in the reduction of Coil ⇔ Sheet misclassifications.
first_indexed	2024-10-01T07:32:48Z
format	Journal Article
id	ntu-10356/84028
institution	Nanyang Technological University
language	English
last_indexed	2024-10-01T07:32:48Z
publishDate	2016
record_format	dspace
spelling	ntu-10356/840282022-02-16T16:31:21Z Protein secondary structure prediction using a small training set (compact model) combined with a Complex-valued neural network approach Rashid, Shamima Saraswathi, Saras Kloczkowski, Andrzej Sundaram, Suresh Kolinski, Andrzej School of Computer Engineering Heuristics Secondary structure prediction Background: Protein secondary structure prediction (SSP) has been an area of intense research interest. Despite advances in recent methods conducted on large datasets, the estimated upper limit accuracy is yet to be reached. Since the predictions of SSP methods are applied as input to higher-level structure prediction pipelines, even small errors may have large perturbations in final models. Previous works relied on cross validation as an estimate of classifier accuracy. However, training on large numbers of protein chains compromises the classifier ability to generalize to new sequences. This prompts a novel approach to training and an investigation into the possible structural factors that lead to poor predictions. Here, a small group of 55 proteins termed the compact model is selected from the CB513 dataset using a heuristics-based approach. In a prior work, all sequences were represented as probability matrices of residues adopting each of Helix, Sheet and Coil states, based on energy calculations using the C-Alpha, C-Beta, Side-chain (CABS) algorithm. The functional relationship between the conformational energies computed with CABS force-field and residue states is approximated using a classifier termed the Fully Complex-valued Relaxation Network (FCRN). The FCRN is trained with the compact model proteins. Results: The performance of the compact model is compared with traditional cross-validated accuracies and blind-tested on a dataset of G Switch proteins, obtaining accuracies of ∼81 %. The model demonstrates better results when compared to several techniques in the literature. A comparative case study of the worst performing chain identifies hydrogen bond contacts that lead to Coil ⇔ Sheet misclassifications. Overall, mispredicted Coil residues have a higher propensity to participate in backbone hydrogen bonding than correctly predicted Coils. Conclusions: The implications of these findings are: (i) the choice of training proteins is important in preserving the generalization of a classifier to predict new sequences accurately and (ii) SSP techniques sensitive in distinguishing between backbone hydrogen bonding and side-chain or water-mediated hydrogen bonding might be needed in the reduction of Coil ⇔ Sheet misclassifications. Published version 2016-10-27T09:03:54Z 2019-12-06T15:36:47Z 2016-10-27T09:03:54Z 2019-12-06T15:36:47Z 2016 Journal Article Rashid, S., Saraswathi, S., Kloczkowski, A., Sundaram, S., & Kolinski, A. (2016). Protein secondary structure prediction using a small training set (compact model) combined with a Complex-valued neural network approach. BMC Bioinformatics, 17, 362-. 1471-2105 https://hdl.handle.net/10356/84028 http://hdl.handle.net/10220/41590 10.1186/s12859-016-1209-0 27618812 en BMC Bioinformatics © 2016 The Author(s). Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. 18 p. application/pdf
spellingShingle	Heuristics Secondary structure prediction Rashid, Shamima Saraswathi, Saras Kloczkowski, Andrzej Sundaram, Suresh Kolinski, Andrzej Protein secondary structure prediction using a small training set (compact model) combined with a Complex-valued neural network approach
title	Protein secondary structure prediction using a small training set (compact model) combined with a Complex-valued neural network approach
title_full	Protein secondary structure prediction using a small training set (compact model) combined with a Complex-valued neural network approach
title_fullStr	Protein secondary structure prediction using a small training set (compact model) combined with a Complex-valued neural network approach
title_full_unstemmed	Protein secondary structure prediction using a small training set (compact model) combined with a Complex-valued neural network approach
title_short	Protein secondary structure prediction using a small training set (compact model) combined with a Complex-valued neural network approach
title_sort	protein secondary structure prediction using a small training set compact model combined with a complex valued neural network approach
topic	Heuristics Secondary structure prediction
url	https://hdl.handle.net/10356/84028 http://hdl.handle.net/10220/41590
work_keys_str_mv	AT rashidshamima proteinsecondarystructurepredictionusingasmalltrainingsetcompactmodelcombinedwithacomplexvaluedneuralnetworkapproach AT saraswathisaras proteinsecondarystructurepredictionusingasmalltrainingsetcompactmodelcombinedwithacomplexvaluedneuralnetworkapproach AT kloczkowskiandrzej proteinsecondarystructurepredictionusingasmalltrainingsetcompactmodelcombinedwithacomplexvaluedneuralnetworkapproach AT sundaramsuresh proteinsecondarystructurepredictionusingasmalltrainingsetcompactmodelcombinedwithacomplexvaluedneuralnetworkapproach AT kolinskiandrzej proteinsecondarystructurepredictionusingasmalltrainingsetcompactmodelcombinedwithacomplexvaluedneuralnetworkapproach

Protein secondary structure prediction using a small training set (compact model) combined with a Complex-valued neural network approach

Similar Items