Exploration of the quantitative Structure-Activity relationships for predicting Cyclooxygenase-2 inhibition bioactivity by Machine learning approaches

Cyclooxygenase-2 (COX-2) overexpression in many humans plays a key role in carcinogenic and inflammation-associated diseases. Several therapeutic and pharmaceutic drugs have been developed but some of them either lack the potency to control the COX-2 related diseases or promote unwanted side effects...

Full description

Bibliographic Details
Main Authors: Kevin Tochukwu Dibia, Philomena Kanwulia Igbokwe, Godfrey Ifechukwu Ezemagu, Christian Oluchukwu Asadu
Format: Article
Language:English
Published: Elsevier 2022-01-01
Series:Results in Chemistry
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2211715621001776
Description
Summary:Cyclooxygenase-2 (COX-2) overexpression in many humans plays a key role in carcinogenic and inflammation-associated diseases. Several therapeutic and pharmaceutic drugs have been developed but some of them either lack the potency to control the COX-2 related diseases or promote unwanted side effects. The prospect and possibility for the development of drugs with novel therapeutic and pharmacological properties using the quantitative structure–activity relationship (QSAR) model were investigated. The model involves the application of chemical descriptors and supervised machine learning to predict the bioactivity classes of molecules for COX-2 inhibition using real multidimensional COX-2 inhibitors obtained from a curated database. The PubChem Fingerprints is the class of descriptor used in developing the model. A model performance check is carried out on 22 SciKit-Learn models and a comparative analysis of their predictive performance in classifying bioactivity of compounds is given in terms of validation accuracy. Unsupervised learning using principal component analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE) algorithms are applied as techniques to explore the effect of dimensionality in the training data. The PCA algorithm is a tool used to reduce the number of variables contained in the dataset, though preserving as much information as possible. The t-SNE algorithm is mainly used for data exploration and visualization of the multi-dimensional data set. Despite their high predictive performances, the eXtreme Gradient Boosting Classifier (XGB Classifier) algorithm is the ultimate performer. More so, hyperparameter tuning, and regularization account for excellent model statistics with higher predictive power at 10-fold cross-validation. Model metrics including log-loss probability (0.1208), accuracy score (0.9484), Matthew’s correlation coefficient (0.8741), among others proved adequately significant. Furthermore, the developed model is validated using recommended OECD metrics, such as Precision, Recall, and Balanced Accuracy for classification. The results in this study offer important pharmacological insight, and this insight can lead to designing novel bioactive drugs with undesired side effects. The proposed QSAR approach in this study achieves a futuristic performance when applied in drug development schemes different from conventional methods.
ISSN:2211-7156