Discovering the Active Ingredients of Medicine and Food Homologous Substances for Inhibiting the Cyclooxygenase-2 Metabolic Pathway by Machine Learning Algorithms

Cyclooxygenase-2 (COX-2) and microsomal prostaglandin E<sub>2</sub> synthase (mPGES-1) are two key targets in anti-inflammatory therapy. Medicine and food homology (MFH) substances have both edible and medicinal properties, providing a valuable resource for the development of novel, safe...

Full description

Bibliographic Details
Main Authors: Yujia Tian, Zhixing Zhang, Aixia Yan
Format: Article
Language:English
Published: MDPI AG 2023-09-01
Series:Molecules
Subjects:
Online Access:https://www.mdpi.com/1420-3049/28/19/6782
_version_ 1797575567648751616
author Yujia Tian
Zhixing Zhang
Aixia Yan
author_facet Yujia Tian
Zhixing Zhang
Aixia Yan
author_sort Yujia Tian
collection DOAJ
description Cyclooxygenase-2 (COX-2) and microsomal prostaglandin E<sub>2</sub> synthase (mPGES-1) are two key targets in anti-inflammatory therapy. Medicine and food homology (MFH) substances have both edible and medicinal properties, providing a valuable resource for the development of novel, safe, and efficient COX-2 and mPGES-1 inhibitors. In this study, we collected active ingredients from 503 MFH substances and constructed the first comprehensive MFH database containing 27,319 molecules. Subsequently, we performed Murcko scaffold analysis and K-means clustering to deeply analyze the composition of the constructed database and evaluate its structural diversity. Furthermore, we employed four supervised machine learning algorithms, including support vector machine (SVM), random forest (RF), deep neural networks (DNNs), and eXtreme Gradient Boosting (XGBoost), as well as ensemble learning, to establish 640 classification models and 160 regression models for COX-2 and mPGES-1 inhibitors. Among them, ModelA_ensemble_RF_1 emerged as the optimal classification model for COX-2 inhibitors, achieving predicted Matthews correlation coefficient (MCC) values of 0.802 and 0.603 on the test set and external validation set, respectively. ModelC_RDKIT_SVM_2 was identified as the best regression model based on COX-2 inhibitors, with root mean squared error (RMSE) values of 0.419 and 0.513 on the test set and external validation set, respectively. ModelD_ECFP_SVM_4 stood out as the top classification model for mPGES-1 inhibitors, attaining MCC values of 0.832 and 0.584 on the test set and external validation set, respectively. The optimal regression model for mPGES-1 inhibitors, ModelF_3D_SVM_1, exhibited predictive RMSE values of 0.253 and 0.35 on the test set and external validation set, respectively. Finally, we proposed a ligand-based cascade virtual screening strategy, which integrated the well-performing supervised machine learning models with unsupervised learning: the self-organized map (SOM) and molecular scaffold analysis. Using this virtual screening workflow, we discovered 10 potential COX-2 inhibitors and 15 potential mPGES-1 inhibitors from the MFH database. We further verified candidates by molecular docking, investigated the interaction of the candidate molecules upon binding to COX-2 or mPGES-1. The constructed comprehensive MFH database has laid a solid foundation for the further research and utilization of the MFH substances. The series of well-performing machine learning models can be employed to predict the COX-2 and mPGES-1 inhibitory capabilities of unknown compounds, thereby aiding in the discovery of anti-inflammatory medications. The COX-2 and mPGES-1 potential inhibitor molecules identified through the cascade virtual screening approach provide insights and references for the design of highly effective and safe novel anti-inflammatory drugs.
first_indexed 2024-03-10T21:40:20Z
format Article
id doaj.art-fea74c7abf9b49f3833db318db7c3dce
institution Directory Open Access Journal
issn 1420-3049
language English
last_indexed 2024-03-10T21:40:20Z
publishDate 2023-09-01
publisher MDPI AG
record_format Article
series Molecules
spelling doaj.art-fea74c7abf9b49f3833db318db7c3dce2023-11-19T14:45:30ZengMDPI AGMolecules1420-30492023-09-012819678210.3390/molecules28196782Discovering the Active Ingredients of Medicine and Food Homologous Substances for Inhibiting the Cyclooxygenase-2 Metabolic Pathway by Machine Learning AlgorithmsYujia Tian0Zhixing Zhang1Aixia Yan2State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, 15 Beisanhuan East Road, Beijing 100029, ChinaState Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, 15 Beisanhuan East Road, Beijing 100029, ChinaState Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, 15 Beisanhuan East Road, Beijing 100029, ChinaCyclooxygenase-2 (COX-2) and microsomal prostaglandin E<sub>2</sub> synthase (mPGES-1) are two key targets in anti-inflammatory therapy. Medicine and food homology (MFH) substances have both edible and medicinal properties, providing a valuable resource for the development of novel, safe, and efficient COX-2 and mPGES-1 inhibitors. In this study, we collected active ingredients from 503 MFH substances and constructed the first comprehensive MFH database containing 27,319 molecules. Subsequently, we performed Murcko scaffold analysis and K-means clustering to deeply analyze the composition of the constructed database and evaluate its structural diversity. Furthermore, we employed four supervised machine learning algorithms, including support vector machine (SVM), random forest (RF), deep neural networks (DNNs), and eXtreme Gradient Boosting (XGBoost), as well as ensemble learning, to establish 640 classification models and 160 regression models for COX-2 and mPGES-1 inhibitors. Among them, ModelA_ensemble_RF_1 emerged as the optimal classification model for COX-2 inhibitors, achieving predicted Matthews correlation coefficient (MCC) values of 0.802 and 0.603 on the test set and external validation set, respectively. ModelC_RDKIT_SVM_2 was identified as the best regression model based on COX-2 inhibitors, with root mean squared error (RMSE) values of 0.419 and 0.513 on the test set and external validation set, respectively. ModelD_ECFP_SVM_4 stood out as the top classification model for mPGES-1 inhibitors, attaining MCC values of 0.832 and 0.584 on the test set and external validation set, respectively. The optimal regression model for mPGES-1 inhibitors, ModelF_3D_SVM_1, exhibited predictive RMSE values of 0.253 and 0.35 on the test set and external validation set, respectively. Finally, we proposed a ligand-based cascade virtual screening strategy, which integrated the well-performing supervised machine learning models with unsupervised learning: the self-organized map (SOM) and molecular scaffold analysis. Using this virtual screening workflow, we discovered 10 potential COX-2 inhibitors and 15 potential mPGES-1 inhibitors from the MFH database. We further verified candidates by molecular docking, investigated the interaction of the candidate molecules upon binding to COX-2 or mPGES-1. The constructed comprehensive MFH database has laid a solid foundation for the further research and utilization of the MFH substances. The series of well-performing machine learning models can be employed to predict the COX-2 and mPGES-1 inhibitory capabilities of unknown compounds, thereby aiding in the discovery of anti-inflammatory medications. The COX-2 and mPGES-1 potential inhibitor molecules identified through the cascade virtual screening approach provide insights and references for the design of highly effective and safe novel anti-inflammatory drugs.https://www.mdpi.com/1420-3049/28/19/6782medicine and food homology (MFH)machine learningensemble learningvirtual screeninganti-inflammation
spellingShingle Yujia Tian
Zhixing Zhang
Aixia Yan
Discovering the Active Ingredients of Medicine and Food Homologous Substances for Inhibiting the Cyclooxygenase-2 Metabolic Pathway by Machine Learning Algorithms
Molecules
medicine and food homology (MFH)
machine learning
ensemble learning
virtual screening
anti-inflammation
title Discovering the Active Ingredients of Medicine and Food Homologous Substances for Inhibiting the Cyclooxygenase-2 Metabolic Pathway by Machine Learning Algorithms
title_full Discovering the Active Ingredients of Medicine and Food Homologous Substances for Inhibiting the Cyclooxygenase-2 Metabolic Pathway by Machine Learning Algorithms
title_fullStr Discovering the Active Ingredients of Medicine and Food Homologous Substances for Inhibiting the Cyclooxygenase-2 Metabolic Pathway by Machine Learning Algorithms
title_full_unstemmed Discovering the Active Ingredients of Medicine and Food Homologous Substances for Inhibiting the Cyclooxygenase-2 Metabolic Pathway by Machine Learning Algorithms
title_short Discovering the Active Ingredients of Medicine and Food Homologous Substances for Inhibiting the Cyclooxygenase-2 Metabolic Pathway by Machine Learning Algorithms
title_sort discovering the active ingredients of medicine and food homologous substances for inhibiting the cyclooxygenase 2 metabolic pathway by machine learning algorithms
topic medicine and food homology (MFH)
machine learning
ensemble learning
virtual screening
anti-inflammation
url https://www.mdpi.com/1420-3049/28/19/6782
work_keys_str_mv AT yujiatian discoveringtheactiveingredientsofmedicineandfoodhomologoussubstancesforinhibitingthecyclooxygenase2metabolicpathwaybymachinelearningalgorithms
AT zhixingzhang discoveringtheactiveingredientsofmedicineandfoodhomologoussubstancesforinhibitingthecyclooxygenase2metabolicpathwaybymachinelearningalgorithms
AT aixiayan discoveringtheactiveingredientsofmedicineandfoodhomologoussubstancesforinhibitingthecyclooxygenase2metabolicpathwaybymachinelearningalgorithms