Large-scale comparison of machine learning methods for profiling prediction of kinase inhibitors

Abstract Conventional machine learning (ML) and deep learning (DL) play a key role in the selectivity prediction of kinase inhibitors. A number of models based on available datasets can be used to predict the kinase profile of compounds, but there is still controversy about the advantages and disadv...

Full description

Bibliographic Details
Main Authors:	Jiangxia Wu, Yihao Chen, Jingxing Wu, Duancheng Zhao, Jindi Huang, MuJie Lin, Ling Wang
Format:	Article
Language:	English
Published:	BMC 2024-01-01
Series:	Journal of Cheminformatics
Subjects:	Kinase profiling Machine learning Deep learning Molecular fingerprints Molecular graphs
Online Access:	https://doi.org/10.1186/s13321-023-00799-5

_version_	1797273445736644608
author	Jiangxia Wu Yihao Chen Jingxing Wu Duancheng Zhao Jindi Huang MuJie Lin Ling Wang
author_facet	Jiangxia Wu Yihao Chen Jingxing Wu Duancheng Zhao Jindi Huang MuJie Lin Ling Wang
author_sort	Jiangxia Wu
collection	DOAJ
description	Abstract Conventional machine learning (ML) and deep learning (DL) play a key role in the selectivity prediction of kinase inhibitors. A number of models based on available datasets can be used to predict the kinase profile of compounds, but there is still controversy about the advantages and disadvantages of ML and DL for such tasks. In this study, we constructed a comprehensive benchmark dataset of kinase inhibitors, involving in 141,086 unique compounds and 216,823 well-defined bioassay data points for 354 kinases. We then systematically compared the performance of 12 ML and DL methods on the kinase profiling prediction task. Extensive experimental results reveal that (1) Descriptor-based ML models generally slightly outperform fingerprint-based ML models in terms of predictive performance. RF as an ensemble learning approach displays the overall best predictive performance. (2) Single-task graph-based DL models are generally inferior to conventional descriptor- and fingerprint-based ML models, however, the corresponding multi-task models generally improves the average accuracy of kinase profile prediction. For example, the multi-task FP-GNN model outperforms the conventional descriptor- and fingerprint-based ML models with an average AUC of 0.807. (3) Fusion models based on voting and stacking methods can further improve the performance of the kinase profiling prediction task, specifically, RF::AtomPairs + FP2 + RDKitDes fusion model performs best with the highest average AUC value of 0.825 on the test sets. These findings provide useful information for guiding choices of the ML and DL methods for the kinase profiling prediction tasks. Finally, an online platform called KIPP ( https://kipp.idruglab.cn ) and python software are developed based on the best models to support the kinase profiling prediction, as well as various kinase inhibitor identification tasks including virtual screening, compound repositioning and target fishing.
first_indexed	2024-03-07T14:43:23Z
format	Article
id	doaj.art-20548c7c94374ecb8f669bdba07aed21
institution	Directory Open Access Journal
issn	1758-2946
language	English
last_indexed	2024-03-07T14:43:23Z
publishDate	2024-01-01
publisher	BMC
record_format	Article
series	Journal of Cheminformatics
spelling	doaj.art-20548c7c94374ecb8f669bdba07aed212024-03-05T20:06:10ZengBMCJournal of Cheminformatics1758-29462024-01-0116112210.1186/s13321-023-00799-5Large-scale comparison of machine learning methods for profiling prediction of kinase inhibitorsJiangxia Wu0Yihao Chen1Jingxing Wu2Duancheng Zhao3Jindi Huang4MuJie Lin5Ling Wang6Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of TechnologyGuangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of TechnologyGuangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of TechnologyGuangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of TechnologyGuangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of TechnologyGuangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of TechnologyGuangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of TechnologyAbstract Conventional machine learning (ML) and deep learning (DL) play a key role in the selectivity prediction of kinase inhibitors. A number of models based on available datasets can be used to predict the kinase profile of compounds, but there is still controversy about the advantages and disadvantages of ML and DL for such tasks. In this study, we constructed a comprehensive benchmark dataset of kinase inhibitors, involving in 141,086 unique compounds and 216,823 well-defined bioassay data points for 354 kinases. We then systematically compared the performance of 12 ML and DL methods on the kinase profiling prediction task. Extensive experimental results reveal that (1) Descriptor-based ML models generally slightly outperform fingerprint-based ML models in terms of predictive performance. RF as an ensemble learning approach displays the overall best predictive performance. (2) Single-task graph-based DL models are generally inferior to conventional descriptor- and fingerprint-based ML models, however, the corresponding multi-task models generally improves the average accuracy of kinase profile prediction. For example, the multi-task FP-GNN model outperforms the conventional descriptor- and fingerprint-based ML models with an average AUC of 0.807. (3) Fusion models based on voting and stacking methods can further improve the performance of the kinase profiling prediction task, specifically, RF::AtomPairs + FP2 + RDKitDes fusion model performs best with the highest average AUC value of 0.825 on the test sets. These findings provide useful information for guiding choices of the ML and DL methods for the kinase profiling prediction tasks. Finally, an online platform called KIPP ( https://kipp.idruglab.cn ) and python software are developed based on the best models to support the kinase profiling prediction, as well as various kinase inhibitor identification tasks including virtual screening, compound repositioning and target fishing.https://doi.org/10.1186/s13321-023-00799-5Kinase profilingMachine learningDeep learningMolecular fingerprintsMolecular graphs
spellingShingle	Jiangxia Wu Yihao Chen Jingxing Wu Duancheng Zhao Jindi Huang MuJie Lin Ling Wang Large-scale comparison of machine learning methods for profiling prediction of kinase inhibitors Journal of Cheminformatics Kinase profiling Machine learning Deep learning Molecular fingerprints Molecular graphs
title	Large-scale comparison of machine learning methods for profiling prediction of kinase inhibitors
title_full	Large-scale comparison of machine learning methods for profiling prediction of kinase inhibitors
title_fullStr	Large-scale comparison of machine learning methods for profiling prediction of kinase inhibitors
title_full_unstemmed	Large-scale comparison of machine learning methods for profiling prediction of kinase inhibitors
title_short	Large-scale comparison of machine learning methods for profiling prediction of kinase inhibitors
title_sort	large scale comparison of machine learning methods for profiling prediction of kinase inhibitors
topic	Kinase profiling Machine learning Deep learning Molecular fingerprints Molecular graphs
url	https://doi.org/10.1186/s13321-023-00799-5
work_keys_str_mv	AT jiangxiawu largescalecomparisonofmachinelearningmethodsforprofilingpredictionofkinaseinhibitors AT yihaochen largescalecomparisonofmachinelearningmethodsforprofilingpredictionofkinaseinhibitors AT jingxingwu largescalecomparisonofmachinelearningmethodsforprofilingpredictionofkinaseinhibitors AT duanchengzhao largescalecomparisonofmachinelearningmethodsforprofilingpredictionofkinaseinhibitors AT jindihuang largescalecomparisonofmachinelearningmethodsforprofilingpredictionofkinaseinhibitors AT mujielin largescalecomparisonofmachinelearningmethodsforprofilingpredictionofkinaseinhibitors AT lingwang largescalecomparisonofmachinelearningmethodsforprofilingpredictionofkinaseinhibitors

Large-scale comparison of machine learning methods for profiling prediction of kinase inhibitors

Similar Items