Inferring linear-B cell epitopes using 2-step metaheuristic variant-feature selection using genetic algorithm
Abstract Linear-B cell epitopes (LBCE) play a vital role in vaccine design; thus, efficiently detecting them from protein sequences is of primary importance. These epitopes consist of amino acids arranged in continuous or discontinuous patterns. Vaccines employ attenuated viruses and purified antige...
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2023-09-01
|
Series: | Scientific Reports |
Online Access: | https://doi.org/10.1038/s41598-023-41179-1 |
_version_ | 1797452562400542720 |
---|---|
author | Pratik Angaitkar Turki Aljrees Saroj Kumar Pandey Ankit Kumar Rekh Ram Janghel Tirath Prasad Sahu Kamred Udham Singh Teekam Singh |
author_facet | Pratik Angaitkar Turki Aljrees Saroj Kumar Pandey Ankit Kumar Rekh Ram Janghel Tirath Prasad Sahu Kamred Udham Singh Teekam Singh |
author_sort | Pratik Angaitkar |
collection | DOAJ |
description | Abstract Linear-B cell epitopes (LBCE) play a vital role in vaccine design; thus, efficiently detecting them from protein sequences is of primary importance. These epitopes consist of amino acids arranged in continuous or discontinuous patterns. Vaccines employ attenuated viruses and purified antigens. LBCE stimulate humoral immunity in the body, where B and T cells target circulating infections. To predict LBCE, the underlying protein sequences undergo a process of feature extraction, feature selection, and classification. Various system models have been proposed for this purpose, but their classification accuracy is only moderate. In order to enhance the accuracy of LBCE classification, this paper presents a novel 2-step metaheuristic variant-feature selection method that combines a linear support vector classifier (LSVC) with a Modified Genetic Algorithm (MGA). The feature selection model employs mono-peptide, dipeptide, and tripeptide features, focusing on the most diverse ones. These selected features are fed into a machine learning (ML)-based parallel ensemble classifier. The ensemble classifier combines correctly classified instances from various classifiers, including k-Nearest Neighbor (kNN), random forest (RF), logistic regression (LR), and support vector machine (SVM). The ensemble classifier came up with an impressively high accuracy of 99.3% as a result of its work. This accuracy is superior to the most recent models that are considered to be state-of-the-art for linear B-cell classification. As a direct consequence of this, the entire system model can now be utilised effectively in real-time clinical settings. |
first_indexed | 2024-03-09T15:10:29Z |
format | Article |
id | doaj.art-18cdb6ced6f54bc88fc5d1a298c7a86b |
institution | Directory Open Access Journal |
issn | 2045-2322 |
language | English |
last_indexed | 2024-03-09T15:10:29Z |
publishDate | 2023-09-01 |
publisher | Nature Portfolio |
record_format | Article |
series | Scientific Reports |
spelling | doaj.art-18cdb6ced6f54bc88fc5d1a298c7a86b2023-11-26T13:23:23ZengNature PortfolioScientific Reports2045-23222023-09-0113111210.1038/s41598-023-41179-1Inferring linear-B cell epitopes using 2-step metaheuristic variant-feature selection using genetic algorithmPratik Angaitkar0Turki Aljrees1Saroj Kumar Pandey2Ankit Kumar3Rekh Ram Janghel4Tirath Prasad Sahu5Kamred Udham Singh6Teekam Singh7Department of Information Technology, National Institute of Technology, RaipurCollege of Computer Science and Engineering, University of Hafr Al BatinDepartment of Computer Engineering & Applications, GLA UniversityDepartment of Computer Engineering & Applications, GLA UniversityDepartment of Information Technology, National Institute of Technology, RaipurDepartment of Information Technology, National Institute of Technology, RaipurSchool of Computing, Graphic Era Hill UniversityDepartment of Computer Science and Engineering, Graphic Era Deemed to be UniversityAbstract Linear-B cell epitopes (LBCE) play a vital role in vaccine design; thus, efficiently detecting them from protein sequences is of primary importance. These epitopes consist of amino acids arranged in continuous or discontinuous patterns. Vaccines employ attenuated viruses and purified antigens. LBCE stimulate humoral immunity in the body, where B and T cells target circulating infections. To predict LBCE, the underlying protein sequences undergo a process of feature extraction, feature selection, and classification. Various system models have been proposed for this purpose, but their classification accuracy is only moderate. In order to enhance the accuracy of LBCE classification, this paper presents a novel 2-step metaheuristic variant-feature selection method that combines a linear support vector classifier (LSVC) with a Modified Genetic Algorithm (MGA). The feature selection model employs mono-peptide, dipeptide, and tripeptide features, focusing on the most diverse ones. These selected features are fed into a machine learning (ML)-based parallel ensemble classifier. The ensemble classifier combines correctly classified instances from various classifiers, including k-Nearest Neighbor (kNN), random forest (RF), logistic regression (LR), and support vector machine (SVM). The ensemble classifier came up with an impressively high accuracy of 99.3% as a result of its work. This accuracy is superior to the most recent models that are considered to be state-of-the-art for linear B-cell classification. As a direct consequence of this, the entire system model can now be utilised effectively in real-time clinical settings.https://doi.org/10.1038/s41598-023-41179-1 |
spellingShingle | Pratik Angaitkar Turki Aljrees Saroj Kumar Pandey Ankit Kumar Rekh Ram Janghel Tirath Prasad Sahu Kamred Udham Singh Teekam Singh Inferring linear-B cell epitopes using 2-step metaheuristic variant-feature selection using genetic algorithm Scientific Reports |
title | Inferring linear-B cell epitopes using 2-step metaheuristic variant-feature selection using genetic algorithm |
title_full | Inferring linear-B cell epitopes using 2-step metaheuristic variant-feature selection using genetic algorithm |
title_fullStr | Inferring linear-B cell epitopes using 2-step metaheuristic variant-feature selection using genetic algorithm |
title_full_unstemmed | Inferring linear-B cell epitopes using 2-step metaheuristic variant-feature selection using genetic algorithm |
title_short | Inferring linear-B cell epitopes using 2-step metaheuristic variant-feature selection using genetic algorithm |
title_sort | inferring linear b cell epitopes using 2 step metaheuristic variant feature selection using genetic algorithm |
url | https://doi.org/10.1038/s41598-023-41179-1 |
work_keys_str_mv | AT pratikangaitkar inferringlinearbcellepitopesusing2stepmetaheuristicvariantfeatureselectionusinggeneticalgorithm AT turkialjrees inferringlinearbcellepitopesusing2stepmetaheuristicvariantfeatureselectionusinggeneticalgorithm AT sarojkumarpandey inferringlinearbcellepitopesusing2stepmetaheuristicvariantfeatureselectionusinggeneticalgorithm AT ankitkumar inferringlinearbcellepitopesusing2stepmetaheuristicvariantfeatureselectionusinggeneticalgorithm AT rekhramjanghel inferringlinearbcellepitopesusing2stepmetaheuristicvariantfeatureselectionusinggeneticalgorithm AT tirathprasadsahu inferringlinearbcellepitopesusing2stepmetaheuristicvariantfeatureselectionusinggeneticalgorithm AT kamredudhamsingh inferringlinearbcellepitopesusing2stepmetaheuristicvariantfeatureselectionusinggeneticalgorithm AT teekamsingh inferringlinearbcellepitopesusing2stepmetaheuristicvariantfeatureselectionusinggeneticalgorithm |