Exploration of the quantitative Structure-Activity relationships for predicting Cyclooxygenase-2 inhibition bioactivity by Machine learning approaches

Cyclooxygenase-2 (COX-2) overexpression in many humans plays a key role in carcinogenic and inflammation-associated diseases. Several therapeutic and pharmaceutic drugs have been developed but some of them either lack the potency to control the COX-2 related diseases or promote unwanted side effects...

Full description

Bibliographic Details
Main Authors: Kevin Tochukwu Dibia, Philomena Kanwulia Igbokwe, Godfrey Ifechukwu Ezemagu, Christian Oluchukwu Asadu
Format: Article
Language:English
Published: Elsevier 2022-01-01
Series:Results in Chemistry
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2211715621001776
_version_ 1798006250569465856
author Kevin Tochukwu Dibia
Philomena Kanwulia Igbokwe
Godfrey Ifechukwu Ezemagu
Christian Oluchukwu Asadu
author_facet Kevin Tochukwu Dibia
Philomena Kanwulia Igbokwe
Godfrey Ifechukwu Ezemagu
Christian Oluchukwu Asadu
author_sort Kevin Tochukwu Dibia
collection DOAJ
description Cyclooxygenase-2 (COX-2) overexpression in many humans plays a key role in carcinogenic and inflammation-associated diseases. Several therapeutic and pharmaceutic drugs have been developed but some of them either lack the potency to control the COX-2 related diseases or promote unwanted side effects. The prospect and possibility for the development of drugs with novel therapeutic and pharmacological properties using the quantitative structure–activity relationship (QSAR) model were investigated. The model involves the application of chemical descriptors and supervised machine learning to predict the bioactivity classes of molecules for COX-2 inhibition using real multidimensional COX-2 inhibitors obtained from a curated database. The PubChem Fingerprints is the class of descriptor used in developing the model. A model performance check is carried out on 22 SciKit-Learn models and a comparative analysis of their predictive performance in classifying bioactivity of compounds is given in terms of validation accuracy. Unsupervised learning using principal component analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE) algorithms are applied as techniques to explore the effect of dimensionality in the training data. The PCA algorithm is a tool used to reduce the number of variables contained in the dataset, though preserving as much information as possible. The t-SNE algorithm is mainly used for data exploration and visualization of the multi-dimensional data set. Despite their high predictive performances, the eXtreme Gradient Boosting Classifier (XGB Classifier) algorithm is the ultimate performer. More so, hyperparameter tuning, and regularization account for excellent model statistics with higher predictive power at 10-fold cross-validation. Model metrics including log-loss probability (0.1208), accuracy score (0.9484), Matthew’s correlation coefficient (0.8741), among others proved adequately significant. Furthermore, the developed model is validated using recommended OECD metrics, such as Precision, Recall, and Balanced Accuracy for classification. The results in this study offer important pharmacological insight, and this insight can lead to designing novel bioactive drugs with undesired side effects. The proposed QSAR approach in this study achieves a futuristic performance when applied in drug development schemes different from conventional methods.
first_indexed 2024-04-11T12:51:48Z
format Article
id doaj.art-6ee5e1574348415eb994e411856c1630
institution Directory Open Access Journal
issn 2211-7156
language English
last_indexed 2024-04-11T12:51:48Z
publishDate 2022-01-01
publisher Elsevier
record_format Article
series Results in Chemistry
spelling doaj.art-6ee5e1574348415eb994e411856c16302022-12-22T04:23:11ZengElsevierResults in Chemistry2211-71562022-01-014100272Exploration of the quantitative Structure-Activity relationships for predicting Cyclooxygenase-2 inhibition bioactivity by Machine learning approachesKevin Tochukwu Dibia0Philomena Kanwulia Igbokwe1Godfrey Ifechukwu Ezemagu2Christian Oluchukwu Asadu3Department of Chemical Engineering, Nnamdi Azikiwe University, Awka, Anambra State; Corresponding author.Department of Chemical Engineering, Nnamdi Azikiwe University, Awka, Anambra StateDepartment of Chemical Engineering, Nnamdi Azikiwe University, Awka, Anambra StateDepartment of Chemical Engineering, Gregory University, Uturu, Abia State, NigeriaCyclooxygenase-2 (COX-2) overexpression in many humans plays a key role in carcinogenic and inflammation-associated diseases. Several therapeutic and pharmaceutic drugs have been developed but some of them either lack the potency to control the COX-2 related diseases or promote unwanted side effects. The prospect and possibility for the development of drugs with novel therapeutic and pharmacological properties using the quantitative structure–activity relationship (QSAR) model were investigated. The model involves the application of chemical descriptors and supervised machine learning to predict the bioactivity classes of molecules for COX-2 inhibition using real multidimensional COX-2 inhibitors obtained from a curated database. The PubChem Fingerprints is the class of descriptor used in developing the model. A model performance check is carried out on 22 SciKit-Learn models and a comparative analysis of their predictive performance in classifying bioactivity of compounds is given in terms of validation accuracy. Unsupervised learning using principal component analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE) algorithms are applied as techniques to explore the effect of dimensionality in the training data. The PCA algorithm is a tool used to reduce the number of variables contained in the dataset, though preserving as much information as possible. The t-SNE algorithm is mainly used for data exploration and visualization of the multi-dimensional data set. Despite their high predictive performances, the eXtreme Gradient Boosting Classifier (XGB Classifier) algorithm is the ultimate performer. More so, hyperparameter tuning, and regularization account for excellent model statistics with higher predictive power at 10-fold cross-validation. Model metrics including log-loss probability (0.1208), accuracy score (0.9484), Matthew’s correlation coefficient (0.8741), among others proved adequately significant. Furthermore, the developed model is validated using recommended OECD metrics, such as Precision, Recall, and Balanced Accuracy for classification. The results in this study offer important pharmacological insight, and this insight can lead to designing novel bioactive drugs with undesired side effects. The proposed QSAR approach in this study achieves a futuristic performance when applied in drug development schemes different from conventional methods.http://www.sciencedirect.com/science/article/pii/S2211715621001776Cyclooxygenase-2BioactivityQuantitative Structure-Activity RelationshipMachine LearningMolecular Fingerprints
spellingShingle Kevin Tochukwu Dibia
Philomena Kanwulia Igbokwe
Godfrey Ifechukwu Ezemagu
Christian Oluchukwu Asadu
Exploration of the quantitative Structure-Activity relationships for predicting Cyclooxygenase-2 inhibition bioactivity by Machine learning approaches
Results in Chemistry
Cyclooxygenase-2
Bioactivity
Quantitative Structure-Activity Relationship
Machine Learning
Molecular Fingerprints
title Exploration of the quantitative Structure-Activity relationships for predicting Cyclooxygenase-2 inhibition bioactivity by Machine learning approaches
title_full Exploration of the quantitative Structure-Activity relationships for predicting Cyclooxygenase-2 inhibition bioactivity by Machine learning approaches
title_fullStr Exploration of the quantitative Structure-Activity relationships for predicting Cyclooxygenase-2 inhibition bioactivity by Machine learning approaches
title_full_unstemmed Exploration of the quantitative Structure-Activity relationships for predicting Cyclooxygenase-2 inhibition bioactivity by Machine learning approaches
title_short Exploration of the quantitative Structure-Activity relationships for predicting Cyclooxygenase-2 inhibition bioactivity by Machine learning approaches
title_sort exploration of the quantitative structure activity relationships for predicting cyclooxygenase 2 inhibition bioactivity by machine learning approaches
topic Cyclooxygenase-2
Bioactivity
Quantitative Structure-Activity Relationship
Machine Learning
Molecular Fingerprints
url http://www.sciencedirect.com/science/article/pii/S2211715621001776
work_keys_str_mv AT kevintochukwudibia explorationofthequantitativestructureactivityrelationshipsforpredictingcyclooxygenase2inhibitionbioactivitybymachinelearningapproaches
AT philomenakanwuliaigbokwe explorationofthequantitativestructureactivityrelationshipsforpredictingcyclooxygenase2inhibitionbioactivitybymachinelearningapproaches
AT godfreyifechukwuezemagu explorationofthequantitativestructureactivityrelationshipsforpredictingcyclooxygenase2inhibitionbioactivitybymachinelearningapproaches
AT christianoluchukwuasadu explorationofthequantitativestructureactivityrelationshipsforpredictingcyclooxygenase2inhibitionbioactivitybymachinelearningapproaches