Development of Machine Learning Models for Accurately Predicting and Ranking the Activity of Lead Molecules to Inhibit PRC2 Dependent Cancer

Disruption of epigenetic processes to eradicate tumor cells is among the most promising interventions for cancer control. EZH2 (Enhancer of zeste homolog 2), a catalytic component of polycomb repressive complex 2 (PRC2), methylates lysine 27 of histone H3 to promote transcriptional silencing and is...

全面介紹

書目詳細資料
Main Authors: Danishuddin, Vikas Kumar, Shraddha Parate, Ashutosh Bahuguna, Gihwan Lee, Myeong Ok Kim, Keun Woo Lee
格式: Article
語言:English
出版: MDPI AG 2021-07-01
叢編:Pharmaceuticals
主題:
在線閱讀:https://www.mdpi.com/1424-8247/14/7/699
_version_ 1827687148151635968
author Danishuddin
Vikas Kumar
Shraddha Parate
Ashutosh Bahuguna
Gihwan Lee
Myeong Ok Kim
Keun Woo Lee
author_facet Danishuddin
Vikas Kumar
Shraddha Parate
Ashutosh Bahuguna
Gihwan Lee
Myeong Ok Kim
Keun Woo Lee
author_sort Danishuddin
collection DOAJ
description Disruption of epigenetic processes to eradicate tumor cells is among the most promising interventions for cancer control. EZH2 (Enhancer of zeste homolog 2), a catalytic component of polycomb repressive complex 2 (PRC2), methylates lysine 27 of histone H3 to promote transcriptional silencing and is an important drug target for controlling cancer via epigenetic processes. In the present study, we have developed various predictive models for modeling the inhibitory activity of EZH2. Binary and multiclass models were built using SVM, random forest and XGBoost methods. Rigorous validation approaches including predictiveness curve, Y-randomization and applicability domain (AD) were employed for evaluation of the developed models. Eighteen descriptors selected from Boruta methods have been used for modeling. For binary classification, random forest and XGBoost achieved an accuracy of 0.80 and 0.82, respectively, on external test set. Contrastingly, for multiclass models, random forest and XGBoost achieved an accuracy of 0.73 and 0.75, respectively. 500 Y-randomization runs demonstrate that the models were robust and the correlations were not by chance. Evaluation metrics from predictiveness curve show that the selected eighteen descriptors predict active compounds with total gain (TG) of 0.79 and 0.59 for XGBoost and random forest, respectively. Validated models were further used for virtual screening and molecular docking in search of potential hits. A total of 221 compounds were commonly predicted as active with above the set probability threshold and also under the AD of training set. Molecular docking revealed that three compounds have reasonable binding energy and favorable interactions with critical residues in the active site of EZH2. In conclusion, we highlighted the potential of rigorously validated models for accurately predicting and ranking the activities of lead molecules against cancer epigenetic targets. The models presented in this study represent the platform for development of EZH2 inhibitors.
first_indexed 2024-03-10T09:28:51Z
format Article
id doaj.art-6bd675d8b0b6477d800e66e70a2bea5c
institution Directory Open Access Journal
issn 1424-8247
language English
last_indexed 2024-03-10T09:28:51Z
publishDate 2021-07-01
publisher MDPI AG
record_format Article
series Pharmaceuticals
spelling doaj.art-6bd675d8b0b6477d800e66e70a2bea5c2023-11-22T04:40:26ZengMDPI AGPharmaceuticals1424-82472021-07-0114769910.3390/ph14070699Development of Machine Learning Models for Accurately Predicting and Ranking the Activity of Lead Molecules to Inhibit PRC2 Dependent CancerDanishuddin0Vikas Kumar1Shraddha Parate2Ashutosh Bahuguna3Gihwan Lee4Myeong Ok Kim5Keun Woo Lee6Department of Bio & Medical Big Data (BK21 Program), Division of Life Sciences, Research Institute of Natural Science (RINS), Gyeongsang National University (GNU), 501 Jinju-daero, Jinju 5282, KoreaDepartment of Bio & Medical Big Data (BK21 Program), Division of Life Sciences, Research Institute of Natural Science (RINS), Gyeongsang National University (GNU), 501 Jinju-daero, Jinju 5282, KoreaPlant Molecular Biology and Biotechnology Research Center (PMBBRC), Division of Applied Life Science, Gyeongsang National University (GNU), 501 Jinju-daero, Jinju 52828, KoreaDepartment of Food Science and Technology, Yeungnam University, Gyeongsan 38541, Gyeongsangbuk-do, KoreaDivision of Applied Life Sciences, Gyeongsang National University (GNU), 501 Jinju-daero, Jinju 5282, KoreaDivision of Life Science and Applied Life Science (BK 21 Four), College of Natural Sciences, Gyeongsang National University, Jinju 5282, KoreaDepartment of Bio & Medical Big Data (BK21 Program), Division of Life Sciences, Research Institute of Natural Science (RINS), Gyeongsang National University (GNU), 501 Jinju-daero, Jinju 5282, KoreaDisruption of epigenetic processes to eradicate tumor cells is among the most promising interventions for cancer control. EZH2 (Enhancer of zeste homolog 2), a catalytic component of polycomb repressive complex 2 (PRC2), methylates lysine 27 of histone H3 to promote transcriptional silencing and is an important drug target for controlling cancer via epigenetic processes. In the present study, we have developed various predictive models for modeling the inhibitory activity of EZH2. Binary and multiclass models were built using SVM, random forest and XGBoost methods. Rigorous validation approaches including predictiveness curve, Y-randomization and applicability domain (AD) were employed for evaluation of the developed models. Eighteen descriptors selected from Boruta methods have been used for modeling. For binary classification, random forest and XGBoost achieved an accuracy of 0.80 and 0.82, respectively, on external test set. Contrastingly, for multiclass models, random forest and XGBoost achieved an accuracy of 0.73 and 0.75, respectively. 500 Y-randomization runs demonstrate that the models were robust and the correlations were not by chance. Evaluation metrics from predictiveness curve show that the selected eighteen descriptors predict active compounds with total gain (TG) of 0.79 and 0.59 for XGBoost and random forest, respectively. Validated models were further used for virtual screening and molecular docking in search of potential hits. A total of 221 compounds were commonly predicted as active with above the set probability threshold and also under the AD of training set. Molecular docking revealed that three compounds have reasonable binding energy and favorable interactions with critical residues in the active site of EZH2. In conclusion, we highlighted the potential of rigorously validated models for accurately predicting and ranking the activities of lead molecules against cancer epigenetic targets. The models presented in this study represent the platform for development of EZH2 inhibitors.https://www.mdpi.com/1424-8247/14/7/699cancerepigeneticPRC2machine learningmulti-class models
spellingShingle Danishuddin
Vikas Kumar
Shraddha Parate
Ashutosh Bahuguna
Gihwan Lee
Myeong Ok Kim
Keun Woo Lee
Development of Machine Learning Models for Accurately Predicting and Ranking the Activity of Lead Molecules to Inhibit PRC2 Dependent Cancer
Pharmaceuticals
cancer
epigenetic
PRC2
machine learning
multi-class models
title Development of Machine Learning Models for Accurately Predicting and Ranking the Activity of Lead Molecules to Inhibit PRC2 Dependent Cancer
title_full Development of Machine Learning Models for Accurately Predicting and Ranking the Activity of Lead Molecules to Inhibit PRC2 Dependent Cancer
title_fullStr Development of Machine Learning Models for Accurately Predicting and Ranking the Activity of Lead Molecules to Inhibit PRC2 Dependent Cancer
title_full_unstemmed Development of Machine Learning Models for Accurately Predicting and Ranking the Activity of Lead Molecules to Inhibit PRC2 Dependent Cancer
title_short Development of Machine Learning Models for Accurately Predicting and Ranking the Activity of Lead Molecules to Inhibit PRC2 Dependent Cancer
title_sort development of machine learning models for accurately predicting and ranking the activity of lead molecules to inhibit prc2 dependent cancer
topic cancer
epigenetic
PRC2
machine learning
multi-class models
url https://www.mdpi.com/1424-8247/14/7/699
work_keys_str_mv AT danishuddin developmentofmachinelearningmodelsforaccuratelypredictingandrankingtheactivityofleadmoleculestoinhibitprc2dependentcancer
AT vikaskumar developmentofmachinelearningmodelsforaccuratelypredictingandrankingtheactivityofleadmoleculestoinhibitprc2dependentcancer
AT shraddhaparate developmentofmachinelearningmodelsforaccuratelypredictingandrankingtheactivityofleadmoleculestoinhibitprc2dependentcancer
AT ashutoshbahuguna developmentofmachinelearningmodelsforaccuratelypredictingandrankingtheactivityofleadmoleculestoinhibitprc2dependentcancer
AT gihwanlee developmentofmachinelearningmodelsforaccuratelypredictingandrankingtheactivityofleadmoleculestoinhibitprc2dependentcancer
AT myeongokkim developmentofmachinelearningmodelsforaccuratelypredictingandrankingtheactivityofleadmoleculestoinhibitprc2dependentcancer
AT keunwoolee developmentofmachinelearningmodelsforaccuratelypredictingandrankingtheactivityofleadmoleculestoinhibitprc2dependentcancer