Overlapping Clusters and Support Vector Machines Based Interval Type-2 Fuzzy System for the Prediction of Peptide Binding Affinity

In the post-genome era, it is becoming more complex to process high dimensional, low-instance available, and nonlinear biological datasets. This paper aims to address these characteristics as they have adverse effects on the performance of predictive models in bioinformatics. In this paper, an inter...

Full description

Bibliographic Details
Main Authors: Volkan Uslan, Huseyin Seker, Robert John
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8685099/
_version_ 1818914473578070016
author Volkan Uslan
Huseyin Seker
Robert John
author_facet Volkan Uslan
Huseyin Seker
Robert John
author_sort Volkan Uslan
collection DOAJ
description In the post-genome era, it is becoming more complex to process high dimensional, low-instance available, and nonlinear biological datasets. This paper aims to address these characteristics as they have adverse effects on the performance of predictive models in bioinformatics. In this paper, an interval type-2 Takagi Sugeno fuzzy predictive model is proposed in order to manage high-dimensionality and nonlinearity of such datasets which is the common feature in bioinformatics. A new clustering framework is proposed for this purpose to simplify antecedent operations for an interval type-2 fuzzy system. This new clustering framework is based on overlapping regions between the clusters. The cluster analysis of partitions and statistical information derived from them has identified the upper and lower membership functions forming the premise part. This is further enhanced by adapting the regression version of support vector machines in the consequent part. The proposed method is used in experiments to quantitatively predict affinities of peptide bindings to biomolecules. This case study imposes a challenge in post-genome studies and remains an open problem due to the complexity of the biological system, diversity of peptides, and curse of dimensionality of amino acid index representation characterizing the peptides. Utilizing four different peptide binding affinity datasets, the proposed method resulted in better generalization ability for all of them yielding an improved prediction accuracy of up to 58.2% on unseen peptides in comparison with the predictive methods presented in the literature. Source code of the algorithm is available at https://github.com/sekerbigdatalab.
first_indexed 2024-12-19T23:46:57Z
format Article
id doaj.art-cf5fffda85c44c1e90a89bcbe082472f
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-19T23:46:57Z
publishDate 2019-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-cf5fffda85c44c1e90a89bcbe082472f2022-12-21T20:01:16ZengIEEEIEEE Access2169-35362019-01-017497564976410.1109/ACCESS.2019.29100788685099Overlapping Clusters and Support Vector Machines Based Interval Type-2 Fuzzy System for the Prediction of Peptide Binding AffinityVolkan Uslan0https://orcid.org/0000-0001-6252-8853Huseyin Seker1Robert John2School of Computer Science and Informatics, De Montfort University, Leicester, U.K.Department of Computer Science and Digital Technologies, University of Northumbria at Newcastle, Newcastle upon Tyne, U.K.School of Computer Science, University of Nottingham, Nottingham, U.K.In the post-genome era, it is becoming more complex to process high dimensional, low-instance available, and nonlinear biological datasets. This paper aims to address these characteristics as they have adverse effects on the performance of predictive models in bioinformatics. In this paper, an interval type-2 Takagi Sugeno fuzzy predictive model is proposed in order to manage high-dimensionality and nonlinearity of such datasets which is the common feature in bioinformatics. A new clustering framework is proposed for this purpose to simplify antecedent operations for an interval type-2 fuzzy system. This new clustering framework is based on overlapping regions between the clusters. The cluster analysis of partitions and statistical information derived from them has identified the upper and lower membership functions forming the premise part. This is further enhanced by adapting the regression version of support vector machines in the consequent part. The proposed method is used in experiments to quantitatively predict affinities of peptide bindings to biomolecules. This case study imposes a challenge in post-genome studies and remains an open problem due to the complexity of the biological system, diversity of peptides, and curse of dimensionality of amino acid index representation characterizing the peptides. Utilizing four different peptide binding affinity datasets, the proposed method resulted in better generalization ability for all of them yielding an improved prediction accuracy of up to 58.2% on unseen peptides in comparison with the predictive methods presented in the literature. Source code of the algorithm is available at https://github.com/sekerbigdatalab.https://ieeexplore.ieee.org/document/8685099/Interval type-2 fuzzy systemssupport vector regressionoverlapping clusterspeptide binding affinityclusteringhigh-dimensionality
spellingShingle Volkan Uslan
Huseyin Seker
Robert John
Overlapping Clusters and Support Vector Machines Based Interval Type-2 Fuzzy System for the Prediction of Peptide Binding Affinity
IEEE Access
Interval type-2 fuzzy systems
support vector regression
overlapping clusters
peptide binding affinity
clustering
high-dimensionality
title Overlapping Clusters and Support Vector Machines Based Interval Type-2 Fuzzy System for the Prediction of Peptide Binding Affinity
title_full Overlapping Clusters and Support Vector Machines Based Interval Type-2 Fuzzy System for the Prediction of Peptide Binding Affinity
title_fullStr Overlapping Clusters and Support Vector Machines Based Interval Type-2 Fuzzy System for the Prediction of Peptide Binding Affinity
title_full_unstemmed Overlapping Clusters and Support Vector Machines Based Interval Type-2 Fuzzy System for the Prediction of Peptide Binding Affinity
title_short Overlapping Clusters and Support Vector Machines Based Interval Type-2 Fuzzy System for the Prediction of Peptide Binding Affinity
title_sort overlapping clusters and support vector machines based interval type 2 fuzzy system for the prediction of peptide binding affinity
topic Interval type-2 fuzzy systems
support vector regression
overlapping clusters
peptide binding affinity
clustering
high-dimensionality
url https://ieeexplore.ieee.org/document/8685099/
work_keys_str_mv AT volkanuslan overlappingclustersandsupportvectormachinesbasedintervaltype2fuzzysystemforthepredictionofpeptidebindingaffinity
AT huseyinseker overlappingclustersandsupportvectormachinesbasedintervaltype2fuzzysystemforthepredictionofpeptidebindingaffinity
AT robertjohn overlappingclustersandsupportvectormachinesbasedintervaltype2fuzzysystemforthepredictionofpeptidebindingaffinity