Prediction of HIV-1 protease cleavage site from octapeptide sequence information using selected classifiers and hybrid descriptors

Abstract Background In most parts of the world, especially in underdeveloped countries, acquired immunodeficiency syndrome (AIDS) still remains a major cause of death, disability, and unfavorable economic outcomes. This has necessitated intensive research to develop effective therapeutic agents for...

Full description

Bibliographic Details
Main Authors: Emmanuel Onah, Philip F. Uzor, Ikenna Calvin Ugwoke, Jude Uche Eze, Sunday Tochukwu Ugwuanyi, Ifeanyi Richard Chukwudi, Akachukwu Ibezim
Format: Article
Language:English
Published: BMC 2022-11-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-022-05017-x
_version_ 1811180294293684224
author Emmanuel Onah
Philip F. Uzor
Ikenna Calvin Ugwoke
Jude Uche Eze
Sunday Tochukwu Ugwuanyi
Ifeanyi Richard Chukwudi
Akachukwu Ibezim
author_facet Emmanuel Onah
Philip F. Uzor
Ikenna Calvin Ugwoke
Jude Uche Eze
Sunday Tochukwu Ugwuanyi
Ifeanyi Richard Chukwudi
Akachukwu Ibezim
author_sort Emmanuel Onah
collection DOAJ
description Abstract Background In most parts of the world, especially in underdeveloped countries, acquired immunodeficiency syndrome (AIDS) still remains a major cause of death, disability, and unfavorable economic outcomes. This has necessitated intensive research to develop effective therapeutic agents for the treatment of human immunodeficiency virus (HIV) infection, which is responsible for AIDS. Peptide cleavage by HIV-1 protease is an essential step in the replication of HIV-1. Thus, correct and timely prediction of the cleavage site of HIV-1 protease can significantly speed up and optimize the drug discovery process of novel HIV-1 protease inhibitors. In this work, we built and compared the performance of selected machine learning models for the prediction of HIV-1 protease cleavage site utilizing a hybrid of octapeptide sequence information comprising bond composition, amino acid binary profile (AABP), and physicochemical properties as numerical descriptors serving as input variables for some selected machine learning algorithms. Our work differs from antecedent studies exploring the same subject in the combination of octapeptide descriptors and method used. Instead of using various subsets of the dataset for training and testing the models, we combined the dataset, applied a 3-way data split, and then used a "stratified" 10-fold cross-validation technique alongside the testing set to evaluate the models. Results Among the 8 models evaluated in the “stratified” 10-fold CV experiment, logistic regression, multi-layer perceptron classifier, linear discriminant analysis, gradient boosting classifier, Naive Bayes classifier, and decision tree classifier with AUC, F-score, and B. Acc. scores in the ranges of 0.91–0.96, 0.81–0.88, and 80.1–86.4%, respectively, have the closest predictive performance to the state-of-the-art model (AUC 0.96, F-score 0.80 and B. Acc. ~ 80.0%). Whereas, the perceptron classifier and the K-nearest neighbors had statistically lower performance (AUC 0.77–0.82, F-score 0.53–0.69, and B. Acc. 60.0–68.5%) at p < 0.05. On the other hand, logistic regression, and multi-layer perceptron classifier (AUC of 0.97, F-score > 0.89, and B. Acc. > 90.0%) had the best performance on further evaluation on the testing set, though linear discriminant analysis, gradient boosting classifier, and Naive Bayes classifier equally performed well (AUC > 0.94, F-score > 0.87, and B. Acc. > 86.0%). Conclusions Logistic regression and multi-layer perceptron classifiers have comparable predictive performances to the state-of-the-art model when octapeptide sequence descriptors consisting of AABP, bond composition and standard physicochemical properties are used as input variables. In our future work, we hope to develop a standalone software for HIV-1 protease cleavage site prediction utilizing the linear regression algorithm and the aforementioned octapeptide sequence descriptors.
first_indexed 2024-04-11T06:47:49Z
format Article
id doaj.art-5b26565151ea400f95ad901cf3185b0c
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-04-11T06:47:49Z
publishDate 2022-11-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-5b26565151ea400f95ad901cf3185b0c2022-12-22T04:39:17ZengBMCBMC Bioinformatics1471-21052022-11-0123112010.1186/s12859-022-05017-xPrediction of HIV-1 protease cleavage site from octapeptide sequence information using selected classifiers and hybrid descriptorsEmmanuel Onah0Philip F. Uzor1Ikenna Calvin Ugwoke2Jude Uche Eze3Sunday Tochukwu Ugwuanyi4Ifeanyi Richard Chukwudi5Akachukwu Ibezim6Department of Pharmaceutical and Medicinal Chemistry, University of NigeriaDepartment of Pharmaceutical and Medicinal Chemistry, University of NigeriaDepartment of Pharmaceutical Microbiology and Biotechnology, University of NigeriaDepartment of Clinical Pharmacy and Pharmacy Management, University of NigeriaDepartment of Pharmaceutical Technology and Industrial Pharmacy, University of NigeriaDepartment of Clinical Pharmacy and Pharmacy Management, University of NigeriaDepartment of Pharmaceutical and Medicinal Chemistry, University of NigeriaAbstract Background In most parts of the world, especially in underdeveloped countries, acquired immunodeficiency syndrome (AIDS) still remains a major cause of death, disability, and unfavorable economic outcomes. This has necessitated intensive research to develop effective therapeutic agents for the treatment of human immunodeficiency virus (HIV) infection, which is responsible for AIDS. Peptide cleavage by HIV-1 protease is an essential step in the replication of HIV-1. Thus, correct and timely prediction of the cleavage site of HIV-1 protease can significantly speed up and optimize the drug discovery process of novel HIV-1 protease inhibitors. In this work, we built and compared the performance of selected machine learning models for the prediction of HIV-1 protease cleavage site utilizing a hybrid of octapeptide sequence information comprising bond composition, amino acid binary profile (AABP), and physicochemical properties as numerical descriptors serving as input variables for some selected machine learning algorithms. Our work differs from antecedent studies exploring the same subject in the combination of octapeptide descriptors and method used. Instead of using various subsets of the dataset for training and testing the models, we combined the dataset, applied a 3-way data split, and then used a "stratified" 10-fold cross-validation technique alongside the testing set to evaluate the models. Results Among the 8 models evaluated in the “stratified” 10-fold CV experiment, logistic regression, multi-layer perceptron classifier, linear discriminant analysis, gradient boosting classifier, Naive Bayes classifier, and decision tree classifier with AUC, F-score, and B. Acc. scores in the ranges of 0.91–0.96, 0.81–0.88, and 80.1–86.4%, respectively, have the closest predictive performance to the state-of-the-art model (AUC 0.96, F-score 0.80 and B. Acc. ~ 80.0%). Whereas, the perceptron classifier and the K-nearest neighbors had statistically lower performance (AUC 0.77–0.82, F-score 0.53–0.69, and B. Acc. 60.0–68.5%) at p < 0.05. On the other hand, logistic regression, and multi-layer perceptron classifier (AUC of 0.97, F-score > 0.89, and B. Acc. > 90.0%) had the best performance on further evaluation on the testing set, though linear discriminant analysis, gradient boosting classifier, and Naive Bayes classifier equally performed well (AUC > 0.94, F-score > 0.87, and B. Acc. > 86.0%). Conclusions Logistic regression and multi-layer perceptron classifiers have comparable predictive performances to the state-of-the-art model when octapeptide sequence descriptors consisting of AABP, bond composition and standard physicochemical properties are used as input variables. In our future work, we hope to develop a standalone software for HIV-1 protease cleavage site prediction utilizing the linear regression algorithm and the aforementioned octapeptide sequence descriptors.https://doi.org/10.1186/s12859-022-05017-xHIV-1 proteaseStratified 10-fold CVMachine learningCleavage siteOctapeptide sequenceAmino acid binary profile
spellingShingle Emmanuel Onah
Philip F. Uzor
Ikenna Calvin Ugwoke
Jude Uche Eze
Sunday Tochukwu Ugwuanyi
Ifeanyi Richard Chukwudi
Akachukwu Ibezim
Prediction of HIV-1 protease cleavage site from octapeptide sequence information using selected classifiers and hybrid descriptors
BMC Bioinformatics
HIV-1 protease
Stratified 10-fold CV
Machine learning
Cleavage site
Octapeptide sequence
Amino acid binary profile
title Prediction of HIV-1 protease cleavage site from octapeptide sequence information using selected classifiers and hybrid descriptors
title_full Prediction of HIV-1 protease cleavage site from octapeptide sequence information using selected classifiers and hybrid descriptors
title_fullStr Prediction of HIV-1 protease cleavage site from octapeptide sequence information using selected classifiers and hybrid descriptors
title_full_unstemmed Prediction of HIV-1 protease cleavage site from octapeptide sequence information using selected classifiers and hybrid descriptors
title_short Prediction of HIV-1 protease cleavage site from octapeptide sequence information using selected classifiers and hybrid descriptors
title_sort prediction of hiv 1 protease cleavage site from octapeptide sequence information using selected classifiers and hybrid descriptors
topic HIV-1 protease
Stratified 10-fold CV
Machine learning
Cleavage site
Octapeptide sequence
Amino acid binary profile
url https://doi.org/10.1186/s12859-022-05017-x
work_keys_str_mv AT emmanuelonah predictionofhiv1proteasecleavagesitefromoctapeptidesequenceinformationusingselectedclassifiersandhybriddescriptors
AT philipfuzor predictionofhiv1proteasecleavagesitefromoctapeptidesequenceinformationusingselectedclassifiersandhybriddescriptors
AT ikennacalvinugwoke predictionofhiv1proteasecleavagesitefromoctapeptidesequenceinformationusingselectedclassifiersandhybriddescriptors
AT judeucheeze predictionofhiv1proteasecleavagesitefromoctapeptidesequenceinformationusingselectedclassifiersandhybriddescriptors
AT sundaytochukwuugwuanyi predictionofhiv1proteasecleavagesitefromoctapeptidesequenceinformationusingselectedclassifiersandhybriddescriptors
AT ifeanyirichardchukwudi predictionofhiv1proteasecleavagesitefromoctapeptidesequenceinformationusingselectedclassifiersandhybriddescriptors
AT akachukwuibezim predictionofhiv1proteasecleavagesitefromoctapeptidesequenceinformationusingselectedclassifiersandhybriddescriptors