Application of random forest based on semi-automatic parameter adjustment for optimization of anti-breast cancer drugs
The optimization of drug properties in the process of cancer drug development is very important to save research and development time and cost. In order to make the anti-breast cancer drug candidates with good biological activity, this paper collected 1974 compounds, firstly, the top 20 molecular de...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2022-07-01
|
Series: | Frontiers in Oncology |
Subjects: | |
Online Access: | https://www.frontiersin.org/articles/10.3389/fonc.2022.956705/full |
_version_ | 1828400064944406528 |
---|---|
author | Jiajia Liu Jiajia Liu Jiajia Liu Zhihui Zhou Zhihui Zhou Zhihui Zhou Shanshan Kong Shanshan Kong Shanshan Kong Shanshan Kong Shanshan Kong Zezhong Ma Zezhong Ma Zezhong Ma |
author_facet | Jiajia Liu Jiajia Liu Jiajia Liu Zhihui Zhou Zhihui Zhou Zhihui Zhou Shanshan Kong Shanshan Kong Shanshan Kong Shanshan Kong Shanshan Kong Zezhong Ma Zezhong Ma Zezhong Ma |
author_sort | Jiajia Liu |
collection | DOAJ |
description | The optimization of drug properties in the process of cancer drug development is very important to save research and development time and cost. In order to make the anti-breast cancer drug candidates with good biological activity, this paper collected 1974 compounds, firstly, the top 20 molecular descriptors that have the most influence on biological activity were screened by using XGBoost-based data feature selection; secondly, on this basis, take pIC50 values as feature data and use a variety of machine learning algorithms to compare, soas to select a most suitable algorithm to predict the IC50 and pIC50 values. It is preliminarily found that the effects of Random Forest, XGBoost and Gradient-enhanced algorithms are good and have little difference, and the Support vector machine is the worst. Then, using the Semi-automatic parameter adjustment method to adjust the parameters of Random Forest, XGBoost and Gradient-enhanced algorithms to find the optimal parameters. It is found that the Random Forest algorithm has high accuracy and excellent anti over fitting, and the algorithm is stable. Its prediction accuracy is 0.745. Finally, the accuracy of the results is verified by training the model with the preliminarily selected data, which provides an innovative solution for the optimization of the properties of anti- breast cancer drugs, and can provide better support for the early research and development of anti-breast cancer drugs. |
first_indexed | 2024-12-10T09:25:59Z |
format | Article |
id | doaj.art-d06a7d5dc11e4c4bb1e2b78e544f929c |
institution | Directory Open Access Journal |
issn | 2234-943X |
language | English |
last_indexed | 2024-12-10T09:25:59Z |
publishDate | 2022-07-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Oncology |
spelling | doaj.art-d06a7d5dc11e4c4bb1e2b78e544f929c2022-12-22T01:54:32ZengFrontiers Media S.A.Frontiers in Oncology2234-943X2022-07-011210.3389/fonc.2022.956705956705Application of random forest based on semi-automatic parameter adjustment for optimization of anti-breast cancer drugsJiajia Liu0Jiajia Liu1Jiajia Liu2Zhihui Zhou3Zhihui Zhou4Zhihui Zhou5Shanshan Kong6Shanshan Kong7Shanshan Kong8Shanshan Kong9Shanshan Kong10Zezhong Ma11Zezhong Ma12Zezhong Ma13College of Science, North China University of Science and Technology, Tangshan, ChinaHebei Engineering Research Center for the Intelligentization of Iron Ore Optimization and Ironmaking Raw Materials Preparation Processes, North China University of Science and Technology, Tangshan, ChinaThe Key Laboratory of Engineering Computing in Tangshan City, North China University of Science and Technology, Tangshan, ChinaCollege of Science, North China University of Science and Technology, Tangshan, ChinaThe Key Laboratory of Engineering Computing in Tangshan City, North China University of Science and Technology, Tangshan, ChinaHebei Key Laboratory of Data Science and Application, North China University of Science and Technology, Tangshan, ChinaCollege of Science, North China University of Science and Technology, Tangshan, ChinaHebei Engineering Research Center for the Intelligentization of Iron Ore Optimization and Ironmaking Raw Materials Preparation Processes, North China University of Science and Technology, Tangshan, ChinaThe Key Laboratory of Engineering Computing in Tangshan City, North China University of Science and Technology, Tangshan, ChinaHebei Key Laboratory of Data Science and Application, North China University of Science and Technology, Tangshan, ChinaTangshan Intelligent Industry and Image Processing Technology Innovation Center, North China University of Science and Technology, Tangshan, ChinaCollege of Science, North China University of Science and Technology, Tangshan, ChinaHebei Engineering Research Center for the Intelligentization of Iron Ore Optimization and Ironmaking Raw Materials Preparation Processes, North China University of Science and Technology, Tangshan, ChinaTangshan Intelligent Industry and Image Processing Technology Innovation Center, North China University of Science and Technology, Tangshan, ChinaThe optimization of drug properties in the process of cancer drug development is very important to save research and development time and cost. In order to make the anti-breast cancer drug candidates with good biological activity, this paper collected 1974 compounds, firstly, the top 20 molecular descriptors that have the most influence on biological activity were screened by using XGBoost-based data feature selection; secondly, on this basis, take pIC50 values as feature data and use a variety of machine learning algorithms to compare, soas to select a most suitable algorithm to predict the IC50 and pIC50 values. It is preliminarily found that the effects of Random Forest, XGBoost and Gradient-enhanced algorithms are good and have little difference, and the Support vector machine is the worst. Then, using the Semi-automatic parameter adjustment method to adjust the parameters of Random Forest, XGBoost and Gradient-enhanced algorithms to find the optimal parameters. It is found that the Random Forest algorithm has high accuracy and excellent anti over fitting, and the algorithm is stable. Its prediction accuracy is 0.745. Finally, the accuracy of the results is verified by training the model with the preliminarily selected data, which provides an innovative solution for the optimization of the properties of anti- breast cancer drugs, and can provide better support for the early research and development of anti-breast cancer drugs.https://www.frontiersin.org/articles/10.3389/fonc.2022.956705/fullanti-breast cancerparameter optimizationrandom forestxgboostbioactivity |
spellingShingle | Jiajia Liu Jiajia Liu Jiajia Liu Zhihui Zhou Zhihui Zhou Zhihui Zhou Shanshan Kong Shanshan Kong Shanshan Kong Shanshan Kong Shanshan Kong Zezhong Ma Zezhong Ma Zezhong Ma Application of random forest based on semi-automatic parameter adjustment for optimization of anti-breast cancer drugs Frontiers in Oncology anti-breast cancer parameter optimization random forest xgboost bioactivity |
title | Application of random forest based on semi-automatic parameter adjustment for optimization of anti-breast cancer drugs |
title_full | Application of random forest based on semi-automatic parameter adjustment for optimization of anti-breast cancer drugs |
title_fullStr | Application of random forest based on semi-automatic parameter adjustment for optimization of anti-breast cancer drugs |
title_full_unstemmed | Application of random forest based on semi-automatic parameter adjustment for optimization of anti-breast cancer drugs |
title_short | Application of random forest based on semi-automatic parameter adjustment for optimization of anti-breast cancer drugs |
title_sort | application of random forest based on semi automatic parameter adjustment for optimization of anti breast cancer drugs |
topic | anti-breast cancer parameter optimization random forest xgboost bioactivity |
url | https://www.frontiersin.org/articles/10.3389/fonc.2022.956705/full |
work_keys_str_mv | AT jiajialiu applicationofrandomforestbasedonsemiautomaticparameteradjustmentforoptimizationofantibreastcancerdrugs AT jiajialiu applicationofrandomforestbasedonsemiautomaticparameteradjustmentforoptimizationofantibreastcancerdrugs AT jiajialiu applicationofrandomforestbasedonsemiautomaticparameteradjustmentforoptimizationofantibreastcancerdrugs AT zhihuizhou applicationofrandomforestbasedonsemiautomaticparameteradjustmentforoptimizationofantibreastcancerdrugs AT zhihuizhou applicationofrandomforestbasedonsemiautomaticparameteradjustmentforoptimizationofantibreastcancerdrugs AT zhihuizhou applicationofrandomforestbasedonsemiautomaticparameteradjustmentforoptimizationofantibreastcancerdrugs AT shanshankong applicationofrandomforestbasedonsemiautomaticparameteradjustmentforoptimizationofantibreastcancerdrugs AT shanshankong applicationofrandomforestbasedonsemiautomaticparameteradjustmentforoptimizationofantibreastcancerdrugs AT shanshankong applicationofrandomforestbasedonsemiautomaticparameteradjustmentforoptimizationofantibreastcancerdrugs AT shanshankong applicationofrandomforestbasedonsemiautomaticparameteradjustmentforoptimizationofantibreastcancerdrugs AT shanshankong applicationofrandomforestbasedonsemiautomaticparameteradjustmentforoptimizationofantibreastcancerdrugs AT zezhongma applicationofrandomforestbasedonsemiautomaticparameteradjustmentforoptimizationofantibreastcancerdrugs AT zezhongma applicationofrandomforestbasedonsemiautomaticparameteradjustmentforoptimizationofantibreastcancerdrugs AT zezhongma applicationofrandomforestbasedonsemiautomaticparameteradjustmentforoptimizationofantibreastcancerdrugs |