Inferring linear-B cell epitopes using 2-step metaheuristic variant-feature selection using genetic algorithm

Abstract Linear-B cell epitopes (LBCE) play a vital role in vaccine design; thus, efficiently detecting them from protein sequences is of primary importance. These epitopes consist of amino acids arranged in continuous or discontinuous patterns. Vaccines employ attenuated viruses and purified antige...

Full description

Bibliographic Details
Main Authors: Pratik Angaitkar, Turki Aljrees, Saroj Kumar Pandey, Ankit Kumar, Rekh Ram Janghel, Tirath Prasad Sahu, Kamred Udham Singh, Teekam Singh
Format: Article
Language:English
Published: Nature Portfolio 2023-09-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-023-41179-1
_version_ 1797452562400542720
author Pratik Angaitkar
Turki Aljrees
Saroj Kumar Pandey
Ankit Kumar
Rekh Ram Janghel
Tirath Prasad Sahu
Kamred Udham Singh
Teekam Singh
author_facet Pratik Angaitkar
Turki Aljrees
Saroj Kumar Pandey
Ankit Kumar
Rekh Ram Janghel
Tirath Prasad Sahu
Kamred Udham Singh
Teekam Singh
author_sort Pratik Angaitkar
collection DOAJ
description Abstract Linear-B cell epitopes (LBCE) play a vital role in vaccine design; thus, efficiently detecting them from protein sequences is of primary importance. These epitopes consist of amino acids arranged in continuous or discontinuous patterns. Vaccines employ attenuated viruses and purified antigens. LBCE stimulate humoral immunity in the body, where B and T cells target circulating infections. To predict LBCE, the underlying protein sequences undergo a process of feature extraction, feature selection, and classification. Various system models have been proposed for this purpose, but their classification accuracy is only moderate. In order to enhance the accuracy of LBCE classification, this paper presents a novel 2-step metaheuristic variant-feature selection method that combines a linear support vector classifier (LSVC) with a Modified Genetic Algorithm (MGA). The feature selection model employs mono-peptide, dipeptide, and tripeptide features, focusing on the most diverse ones. These selected features are fed into a machine learning (ML)-based parallel ensemble classifier. The ensemble classifier combines correctly classified instances from various classifiers, including k-Nearest Neighbor (kNN), random forest (RF), logistic regression (LR), and support vector machine (SVM). The ensemble classifier came up with an impressively high accuracy of 99.3% as a result of its work. This accuracy is superior to the most recent models that are considered to be state-of-the-art for linear B-cell classification. As a direct consequence of this, the entire system model can now be utilised effectively in real-time clinical settings.
first_indexed 2024-03-09T15:10:29Z
format Article
id doaj.art-18cdb6ced6f54bc88fc5d1a298c7a86b
institution Directory Open Access Journal
issn 2045-2322
language English
last_indexed 2024-03-09T15:10:29Z
publishDate 2023-09-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj.art-18cdb6ced6f54bc88fc5d1a298c7a86b2023-11-26T13:23:23ZengNature PortfolioScientific Reports2045-23222023-09-0113111210.1038/s41598-023-41179-1Inferring linear-B cell epitopes using 2-step metaheuristic variant-feature selection using genetic algorithmPratik Angaitkar0Turki Aljrees1Saroj Kumar Pandey2Ankit Kumar3Rekh Ram Janghel4Tirath Prasad Sahu5Kamred Udham Singh6Teekam Singh7Department of Information Technology, National Institute of Technology, RaipurCollege of Computer Science and Engineering, University of Hafr Al BatinDepartment of Computer Engineering & Applications, GLA UniversityDepartment of Computer Engineering & Applications, GLA UniversityDepartment of Information Technology, National Institute of Technology, RaipurDepartment of Information Technology, National Institute of Technology, RaipurSchool of Computing, Graphic Era Hill UniversityDepartment of Computer Science and Engineering, Graphic Era Deemed to be UniversityAbstract Linear-B cell epitopes (LBCE) play a vital role in vaccine design; thus, efficiently detecting them from protein sequences is of primary importance. These epitopes consist of amino acids arranged in continuous or discontinuous patterns. Vaccines employ attenuated viruses and purified antigens. LBCE stimulate humoral immunity in the body, where B and T cells target circulating infections. To predict LBCE, the underlying protein sequences undergo a process of feature extraction, feature selection, and classification. Various system models have been proposed for this purpose, but their classification accuracy is only moderate. In order to enhance the accuracy of LBCE classification, this paper presents a novel 2-step metaheuristic variant-feature selection method that combines a linear support vector classifier (LSVC) with a Modified Genetic Algorithm (MGA). The feature selection model employs mono-peptide, dipeptide, and tripeptide features, focusing on the most diverse ones. These selected features are fed into a machine learning (ML)-based parallel ensemble classifier. The ensemble classifier combines correctly classified instances from various classifiers, including k-Nearest Neighbor (kNN), random forest (RF), logistic regression (LR), and support vector machine (SVM). The ensemble classifier came up with an impressively high accuracy of 99.3% as a result of its work. This accuracy is superior to the most recent models that are considered to be state-of-the-art for linear B-cell classification. As a direct consequence of this, the entire system model can now be utilised effectively in real-time clinical settings.https://doi.org/10.1038/s41598-023-41179-1
spellingShingle Pratik Angaitkar
Turki Aljrees
Saroj Kumar Pandey
Ankit Kumar
Rekh Ram Janghel
Tirath Prasad Sahu
Kamred Udham Singh
Teekam Singh
Inferring linear-B cell epitopes using 2-step metaheuristic variant-feature selection using genetic algorithm
Scientific Reports
title Inferring linear-B cell epitopes using 2-step metaheuristic variant-feature selection using genetic algorithm
title_full Inferring linear-B cell epitopes using 2-step metaheuristic variant-feature selection using genetic algorithm
title_fullStr Inferring linear-B cell epitopes using 2-step metaheuristic variant-feature selection using genetic algorithm
title_full_unstemmed Inferring linear-B cell epitopes using 2-step metaheuristic variant-feature selection using genetic algorithm
title_short Inferring linear-B cell epitopes using 2-step metaheuristic variant-feature selection using genetic algorithm
title_sort inferring linear b cell epitopes using 2 step metaheuristic variant feature selection using genetic algorithm
url https://doi.org/10.1038/s41598-023-41179-1
work_keys_str_mv AT pratikangaitkar inferringlinearbcellepitopesusing2stepmetaheuristicvariantfeatureselectionusinggeneticalgorithm
AT turkialjrees inferringlinearbcellepitopesusing2stepmetaheuristicvariantfeatureselectionusinggeneticalgorithm
AT sarojkumarpandey inferringlinearbcellepitopesusing2stepmetaheuristicvariantfeatureselectionusinggeneticalgorithm
AT ankitkumar inferringlinearbcellepitopesusing2stepmetaheuristicvariantfeatureselectionusinggeneticalgorithm
AT rekhramjanghel inferringlinearbcellepitopesusing2stepmetaheuristicvariantfeatureselectionusinggeneticalgorithm
AT tirathprasadsahu inferringlinearbcellepitopesusing2stepmetaheuristicvariantfeatureselectionusinggeneticalgorithm
AT kamredudhamsingh inferringlinearbcellepitopesusing2stepmetaheuristicvariantfeatureselectionusinggeneticalgorithm
AT teekamsingh inferringlinearbcellepitopesusing2stepmetaheuristicvariantfeatureselectionusinggeneticalgorithm