On the relevance of sophisticated structural annotations for disulfide connectivity pattern prediction.

Disulfide bridges strongly constrain the native structure of many proteins and predicting their formation is therefore a key sub-problem of protein structure and function inference. Most recently proposed approaches for this prediction problem adopt the following pipeline: first they enrich the prim...

Full description

Bibliographic Details
Main Authors: Julien Becker, Francis Maes, Louis Wehenkel
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2013-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC3574028?pdf=render
_version_ 1818546346937810944
author Julien Becker
Francis Maes
Louis Wehenkel
author_facet Julien Becker
Francis Maes
Louis Wehenkel
author_sort Julien Becker
collection DOAJ
description Disulfide bridges strongly constrain the native structure of many proteins and predicting their formation is therefore a key sub-problem of protein structure and function inference. Most recently proposed approaches for this prediction problem adopt the following pipeline: first they enrich the primary sequence with structural annotations, second they apply a binary classifier to each candidate pair of cysteines to predict disulfide bonding probabilities and finally, they use a maximum weight graph matching algorithm to derive the predicted disulfide connectivity pattern of a protein. In this paper, we adopt this three step pipeline and propose an extensive study of the relevance of various structural annotations and feature encodings. In particular, we consider five kinds of structural annotations, among which three are novel in the context of disulfide bridge prediction. So as to be usable by machine learning algorithms, these annotations must be encoded into features. For this purpose, we propose four different feature encodings based on local windows and on different kinds of histograms. The combination of structural annotations with these possible encodings leads to a large number of possible feature functions. In order to identify a minimal subset of relevant feature functions among those, we propose an efficient and interpretable feature function selection scheme, designed so as to avoid any form of overfitting. We apply this scheme on top of three supervised learning algorithms: k-nearest neighbors, support vector machines and extremely randomized trees. Our results indicate that the use of only the PSSM (position-specific scoring matrix) together with the CSP (cysteine separation profile) are sufficient to construct a high performance disulfide pattern predictor and that extremely randomized trees reach a disulfide pattern prediction accuracy of [Formula: see text] on the benchmark dataset SPX[Formula: see text], which corresponds to [Formula: see text] improvement over the state of the art. A web-application is available at http://m24.giga.ulg.ac.be:81/x3CysBridges.
first_indexed 2024-12-12T07:51:59Z
format Article
id doaj.art-0c277cd0830142509704d0f3ef9e572c
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-12-12T07:51:59Z
publishDate 2013-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-0c277cd0830142509704d0f3ef9e572c2022-12-22T00:32:25ZengPublic Library of Science (PLoS)PLoS ONE1932-62032013-01-0182e5662110.1371/journal.pone.0056621On the relevance of sophisticated structural annotations for disulfide connectivity pattern prediction.Julien BeckerFrancis MaesLouis WehenkelDisulfide bridges strongly constrain the native structure of many proteins and predicting their formation is therefore a key sub-problem of protein structure and function inference. Most recently proposed approaches for this prediction problem adopt the following pipeline: first they enrich the primary sequence with structural annotations, second they apply a binary classifier to each candidate pair of cysteines to predict disulfide bonding probabilities and finally, they use a maximum weight graph matching algorithm to derive the predicted disulfide connectivity pattern of a protein. In this paper, we adopt this three step pipeline and propose an extensive study of the relevance of various structural annotations and feature encodings. In particular, we consider five kinds of structural annotations, among which three are novel in the context of disulfide bridge prediction. So as to be usable by machine learning algorithms, these annotations must be encoded into features. For this purpose, we propose four different feature encodings based on local windows and on different kinds of histograms. The combination of structural annotations with these possible encodings leads to a large number of possible feature functions. In order to identify a minimal subset of relevant feature functions among those, we propose an efficient and interpretable feature function selection scheme, designed so as to avoid any form of overfitting. We apply this scheme on top of three supervised learning algorithms: k-nearest neighbors, support vector machines and extremely randomized trees. Our results indicate that the use of only the PSSM (position-specific scoring matrix) together with the CSP (cysteine separation profile) are sufficient to construct a high performance disulfide pattern predictor and that extremely randomized trees reach a disulfide pattern prediction accuracy of [Formula: see text] on the benchmark dataset SPX[Formula: see text], which corresponds to [Formula: see text] improvement over the state of the art. A web-application is available at http://m24.giga.ulg.ac.be:81/x3CysBridges.http://europepmc.org/articles/PMC3574028?pdf=render
spellingShingle Julien Becker
Francis Maes
Louis Wehenkel
On the relevance of sophisticated structural annotations for disulfide connectivity pattern prediction.
PLoS ONE
title On the relevance of sophisticated structural annotations for disulfide connectivity pattern prediction.
title_full On the relevance of sophisticated structural annotations for disulfide connectivity pattern prediction.
title_fullStr On the relevance of sophisticated structural annotations for disulfide connectivity pattern prediction.
title_full_unstemmed On the relevance of sophisticated structural annotations for disulfide connectivity pattern prediction.
title_short On the relevance of sophisticated structural annotations for disulfide connectivity pattern prediction.
title_sort on the relevance of sophisticated structural annotations for disulfide connectivity pattern prediction
url http://europepmc.org/articles/PMC3574028?pdf=render
work_keys_str_mv AT julienbecker ontherelevanceofsophisticatedstructuralannotationsfordisulfideconnectivitypatternprediction
AT francismaes ontherelevanceofsophisticatedstructuralannotationsfordisulfideconnectivitypatternprediction
AT louiswehenkel ontherelevanceofsophisticatedstructuralannotationsfordisulfideconnectivitypatternprediction