Application of random forest based on semi-automatic parameter adjustment for optimization of anti-breast cancer drugs

The optimization of drug properties in the process of cancer drug development is very important to save research and development time and cost. In order to make the anti-breast cancer drug candidates with good biological activity, this paper collected 1974 compounds, firstly, the top 20 molecular de...

Full description

Bibliographic Details
Main Authors: Jiajia Liu, Zhihui Zhou, Shanshan Kong, Zezhong Ma
Format: Article
Language:English
Published: Frontiers Media S.A. 2022-07-01
Series:Frontiers in Oncology
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fonc.2022.956705/full
_version_ 1828400064944406528
author Jiajia Liu
Jiajia Liu
Jiajia Liu
Zhihui Zhou
Zhihui Zhou
Zhihui Zhou
Shanshan Kong
Shanshan Kong
Shanshan Kong
Shanshan Kong
Shanshan Kong
Zezhong Ma
Zezhong Ma
Zezhong Ma
author_facet Jiajia Liu
Jiajia Liu
Jiajia Liu
Zhihui Zhou
Zhihui Zhou
Zhihui Zhou
Shanshan Kong
Shanshan Kong
Shanshan Kong
Shanshan Kong
Shanshan Kong
Zezhong Ma
Zezhong Ma
Zezhong Ma
author_sort Jiajia Liu
collection DOAJ
description The optimization of drug properties in the process of cancer drug development is very important to save research and development time and cost. In order to make the anti-breast cancer drug candidates with good biological activity, this paper collected 1974 compounds, firstly, the top 20 molecular descriptors that have the most influence on biological activity were screened by using XGBoost-based data feature selection; secondly, on this basis, take pIC50 values as feature data and use a variety of machine learning algorithms to compare, soas to select a most suitable algorithm to predict the IC50 and pIC50 values. It is preliminarily found that the effects of Random Forest, XGBoost and Gradient-enhanced algorithms are good and have little difference, and the Support vector machine is the worst. Then, using the Semi-automatic parameter adjustment method to adjust the parameters of Random Forest, XGBoost and Gradient-enhanced algorithms to find the optimal parameters. It is found that the Random Forest algorithm has high accuracy and excellent anti over fitting, and the algorithm is stable. Its prediction accuracy is 0.745. Finally, the accuracy of the results is verified by training the model with the preliminarily selected data, which provides an innovative solution for the optimization of the properties of anti- breast cancer drugs, and can provide better support for the early research and development of anti-breast cancer drugs.
first_indexed 2024-12-10T09:25:59Z
format Article
id doaj.art-d06a7d5dc11e4c4bb1e2b78e544f929c
institution Directory Open Access Journal
issn 2234-943X
language English
last_indexed 2024-12-10T09:25:59Z
publishDate 2022-07-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Oncology
spelling doaj.art-d06a7d5dc11e4c4bb1e2b78e544f929c2022-12-22T01:54:32ZengFrontiers Media S.A.Frontiers in Oncology2234-943X2022-07-011210.3389/fonc.2022.956705956705Application of random forest based on semi-automatic parameter adjustment for optimization of anti-breast cancer drugsJiajia Liu0Jiajia Liu1Jiajia Liu2Zhihui Zhou3Zhihui Zhou4Zhihui Zhou5Shanshan Kong6Shanshan Kong7Shanshan Kong8Shanshan Kong9Shanshan Kong10Zezhong Ma11Zezhong Ma12Zezhong Ma13College of Science, North China University of Science and Technology, Tangshan, ChinaHebei Engineering Research Center for the Intelligentization of Iron Ore Optimization and Ironmaking Raw Materials Preparation Processes, North China University of Science and Technology, Tangshan, ChinaThe Key Laboratory of Engineering Computing in Tangshan City, North China University of Science and Technology, Tangshan, ChinaCollege of Science, North China University of Science and Technology, Tangshan, ChinaThe Key Laboratory of Engineering Computing in Tangshan City, North China University of Science and Technology, Tangshan, ChinaHebei Key Laboratory of Data Science and Application, North China University of Science and Technology, Tangshan, ChinaCollege of Science, North China University of Science and Technology, Tangshan, ChinaHebei Engineering Research Center for the Intelligentization of Iron Ore Optimization and Ironmaking Raw Materials Preparation Processes, North China University of Science and Technology, Tangshan, ChinaThe Key Laboratory of Engineering Computing in Tangshan City, North China University of Science and Technology, Tangshan, ChinaHebei Key Laboratory of Data Science and Application, North China University of Science and Technology, Tangshan, ChinaTangshan Intelligent Industry and Image Processing Technology Innovation Center, North China University of Science and Technology, Tangshan, ChinaCollege of Science, North China University of Science and Technology, Tangshan, ChinaHebei Engineering Research Center for the Intelligentization of Iron Ore Optimization and Ironmaking Raw Materials Preparation Processes, North China University of Science and Technology, Tangshan, ChinaTangshan Intelligent Industry and Image Processing Technology Innovation Center, North China University of Science and Technology, Tangshan, ChinaThe optimization of drug properties in the process of cancer drug development is very important to save research and development time and cost. In order to make the anti-breast cancer drug candidates with good biological activity, this paper collected 1974 compounds, firstly, the top 20 molecular descriptors that have the most influence on biological activity were screened by using XGBoost-based data feature selection; secondly, on this basis, take pIC50 values as feature data and use a variety of machine learning algorithms to compare, soas to select a most suitable algorithm to predict the IC50 and pIC50 values. It is preliminarily found that the effects of Random Forest, XGBoost and Gradient-enhanced algorithms are good and have little difference, and the Support vector machine is the worst. Then, using the Semi-automatic parameter adjustment method to adjust the parameters of Random Forest, XGBoost and Gradient-enhanced algorithms to find the optimal parameters. It is found that the Random Forest algorithm has high accuracy and excellent anti over fitting, and the algorithm is stable. Its prediction accuracy is 0.745. Finally, the accuracy of the results is verified by training the model with the preliminarily selected data, which provides an innovative solution for the optimization of the properties of anti- breast cancer drugs, and can provide better support for the early research and development of anti-breast cancer drugs.https://www.frontiersin.org/articles/10.3389/fonc.2022.956705/fullanti-breast cancerparameter optimizationrandom forestxgboostbioactivity
spellingShingle Jiajia Liu
Jiajia Liu
Jiajia Liu
Zhihui Zhou
Zhihui Zhou
Zhihui Zhou
Shanshan Kong
Shanshan Kong
Shanshan Kong
Shanshan Kong
Shanshan Kong
Zezhong Ma
Zezhong Ma
Zezhong Ma
Application of random forest based on semi-automatic parameter adjustment for optimization of anti-breast cancer drugs
Frontiers in Oncology
anti-breast cancer
parameter optimization
random forest
xgboost
bioactivity
title Application of random forest based on semi-automatic parameter adjustment for optimization of anti-breast cancer drugs
title_full Application of random forest based on semi-automatic parameter adjustment for optimization of anti-breast cancer drugs
title_fullStr Application of random forest based on semi-automatic parameter adjustment for optimization of anti-breast cancer drugs
title_full_unstemmed Application of random forest based on semi-automatic parameter adjustment for optimization of anti-breast cancer drugs
title_short Application of random forest based on semi-automatic parameter adjustment for optimization of anti-breast cancer drugs
title_sort application of random forest based on semi automatic parameter adjustment for optimization of anti breast cancer drugs
topic anti-breast cancer
parameter optimization
random forest
xgboost
bioactivity
url https://www.frontiersin.org/articles/10.3389/fonc.2022.956705/full
work_keys_str_mv AT jiajialiu applicationofrandomforestbasedonsemiautomaticparameteradjustmentforoptimizationofantibreastcancerdrugs
AT jiajialiu applicationofrandomforestbasedonsemiautomaticparameteradjustmentforoptimizationofantibreastcancerdrugs
AT jiajialiu applicationofrandomforestbasedonsemiautomaticparameteradjustmentforoptimizationofantibreastcancerdrugs
AT zhihuizhou applicationofrandomforestbasedonsemiautomaticparameteradjustmentforoptimizationofantibreastcancerdrugs
AT zhihuizhou applicationofrandomforestbasedonsemiautomaticparameteradjustmentforoptimizationofantibreastcancerdrugs
AT zhihuizhou applicationofrandomforestbasedonsemiautomaticparameteradjustmentforoptimizationofantibreastcancerdrugs
AT shanshankong applicationofrandomforestbasedonsemiautomaticparameteradjustmentforoptimizationofantibreastcancerdrugs
AT shanshankong applicationofrandomforestbasedonsemiautomaticparameteradjustmentforoptimizationofantibreastcancerdrugs
AT shanshankong applicationofrandomforestbasedonsemiautomaticparameteradjustmentforoptimizationofantibreastcancerdrugs
AT shanshankong applicationofrandomforestbasedonsemiautomaticparameteradjustmentforoptimizationofantibreastcancerdrugs
AT shanshankong applicationofrandomforestbasedonsemiautomaticparameteradjustmentforoptimizationofantibreastcancerdrugs
AT zezhongma applicationofrandomforestbasedonsemiautomaticparameteradjustmentforoptimizationofantibreastcancerdrugs
AT zezhongma applicationofrandomforestbasedonsemiautomaticparameteradjustmentforoptimizationofantibreastcancerdrugs
AT zezhongma applicationofrandomforestbasedonsemiautomaticparameteradjustmentforoptimizationofantibreastcancerdrugs