Enhancing Generic Reaction Yield Prediction through Reaction Condition-Based Contrastive Learning

Deep learning (DL)-driven efficient synthesis planning may profoundly transform the paradigm for designing novel pharmaceuticals and materials. However, the progress of many DL-assisted synthesis planning (DASP) algorithms has suffered from the lack of reliable automated pathway evaluation tools. As...

Full description

Bibliographic Details
Main Authors:	Xiaodan Yin, Chang-Yu Hsieh, Xiaorui Wang, Zhenxing Wu, Qing Ye, Honglei Bao, Yafeng Deng, Hongming Chen, Pei Luo, Huanxiang Liu, Tingjun Hou, Xiaojun Yao
Format:	Article
Language:	English
Published:	American Association for the Advancement of Science (AAAS) 2024-01-01
Series:	Research
Online Access:	https://spj.science.org/doi/10.34133/research.0292

_version_	1827266612349108224
author	Xiaodan Yin Chang-Yu Hsieh Xiaorui Wang Zhenxing Wu Qing Ye Honglei Bao Yafeng Deng Hongming Chen Pei Luo Huanxiang Liu Tingjun Hou Xiaojun Yao
author_facet	Xiaodan Yin Chang-Yu Hsieh Xiaorui Wang Zhenxing Wu Qing Ye Honglei Bao Yafeng Deng Hongming Chen Pei Luo Huanxiang Liu Tingjun Hou Xiaojun Yao
author_sort	Xiaodan Yin
collection	DOAJ
description	Deep learning (DL)-driven efficient synthesis planning may profoundly transform the paradigm for designing novel pharmaceuticals and materials. However, the progress of many DL-assisted synthesis planning (DASP) algorithms has suffered from the lack of reliable automated pathway evaluation tools. As a critical metric for evaluating chemical reactions, accurate prediction of reaction yields helps improve the practicality of DASP algorithms in the real-world scenarios. Currently, accurately predicting yields of interesting reactions still faces numerous challenges, mainly including the absence of high-quality generic reaction yield datasets and robust generic yield predictors. To compensate for the limitations of high-throughput yield datasets, we curated a generic reaction yield dataset containing 12 reaction categories and rich reaction condition information. Subsequently, by utilizing 2 pretraining tasks based on chemical reaction masked language modeling and contrastive learning, we proposed a powerful bidirectional encoder representations from transformers (BERT)-based reaction yield predictor named Egret. It achieved comparable or even superior performance to the best previous models on 4 benchmark datasets and established state-of-the-art performance on the newly curated dataset. We found that reaction-condition-based contrastive learning enhances the model’s sensitivity to reaction conditions, and Egret is capable of capturing subtle differences between reactions involving identical reactants and products but different reaction conditions. Furthermore, we proposed a new scoring function that incorporated Egret into the evaluation of multistep synthesis routes. Test results showed that yield-incorporated scoring facilitated the prioritization of literature-supported high-yield reaction pathways for target molecules. In addition, through meta-learning strategy, we further improved the reliability of the model’s prediction for reaction types with limited data and lower data quality. Our results suggest that Egret holds the potential to become an essential component of the next-generation DASP tools.
first_indexed	2024-03-08T14:27:57Z
format	Article
id	doaj.art-48a7714075234799af6f43cd8b3ba619
institution	Directory Open Access Journal
issn	2639-5274
language	English
last_indexed	2025-03-22T04:20:37Z
publishDate	2024-01-01
publisher	American Association for the Advancement of Science (AAAS)
record_format	Article
series	Research
spelling	doaj.art-48a7714075234799af6f43cd8b3ba6192024-04-28T07:49:17ZengAmerican Association for the Advancement of Science (AAAS)Research2639-52742024-01-01710.34133/research.0292Enhancing Generic Reaction Yield Prediction through Reaction Condition-Based Contrastive LearningXiaodan Yin0Chang-Yu Hsieh1Xiaorui Wang2Zhenxing Wu3Qing Ye4Honglei Bao5Yafeng Deng6Hongming Chen7Pei Luo8Huanxiang Liu9Tingjun Hou10Xiaojun Yao11Dr. Neher’s Biophysics Laboratory for Innovative Drug Discovery, State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology, Macao 999078, China.Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.Dr. Neher’s Biophysics Laboratory for Innovative Drug Discovery, State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology, Macao 999078, China.Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.Dr. Neher’s Biophysics Laboratory for Innovative Drug Discovery, State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology, Macao 999078, China.CarbonSilicon AI Technology Co. Ltd, Hangzhou, Zhejiang 310018, China.Center of Chemistry and Chemical Biology, Guangzhou Regenerative Medicine and Health Guangdong Laboratory, Guangzhou 510530, China.Dr. Neher’s Biophysics Laboratory for Innovative Drug Discovery, State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology, Macao 999078, China.Faculty of Applied Sciences, Macao Polytechnic University, Macao 999078, China.Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.Faculty of Applied Sciences, Macao Polytechnic University, Macao 999078, China.Deep learning (DL)-driven efficient synthesis planning may profoundly transform the paradigm for designing novel pharmaceuticals and materials. However, the progress of many DL-assisted synthesis planning (DASP) algorithms has suffered from the lack of reliable automated pathway evaluation tools. As a critical metric for evaluating chemical reactions, accurate prediction of reaction yields helps improve the practicality of DASP algorithms in the real-world scenarios. Currently, accurately predicting yields of interesting reactions still faces numerous challenges, mainly including the absence of high-quality generic reaction yield datasets and robust generic yield predictors. To compensate for the limitations of high-throughput yield datasets, we curated a generic reaction yield dataset containing 12 reaction categories and rich reaction condition information. Subsequently, by utilizing 2 pretraining tasks based on chemical reaction masked language modeling and contrastive learning, we proposed a powerful bidirectional encoder representations from transformers (BERT)-based reaction yield predictor named Egret. It achieved comparable or even superior performance to the best previous models on 4 benchmark datasets and established state-of-the-art performance on the newly curated dataset. We found that reaction-condition-based contrastive learning enhances the model’s sensitivity to reaction conditions, and Egret is capable of capturing subtle differences between reactions involving identical reactants and products but different reaction conditions. Furthermore, we proposed a new scoring function that incorporated Egret into the evaluation of multistep synthesis routes. Test results showed that yield-incorporated scoring facilitated the prioritization of literature-supported high-yield reaction pathways for target molecules. In addition, through meta-learning strategy, we further improved the reliability of the model’s prediction for reaction types with limited data and lower data quality. Our results suggest that Egret holds the potential to become an essential component of the next-generation DASP tools.https://spj.science.org/doi/10.34133/research.0292
spellingShingle	Xiaodan Yin Chang-Yu Hsieh Xiaorui Wang Zhenxing Wu Qing Ye Honglei Bao Yafeng Deng Hongming Chen Pei Luo Huanxiang Liu Tingjun Hou Xiaojun Yao Enhancing Generic Reaction Yield Prediction through Reaction Condition-Based Contrastive Learning Research
title	Enhancing Generic Reaction Yield Prediction through Reaction Condition-Based Contrastive Learning
title_full	Enhancing Generic Reaction Yield Prediction through Reaction Condition-Based Contrastive Learning
title_fullStr	Enhancing Generic Reaction Yield Prediction through Reaction Condition-Based Contrastive Learning
title_full_unstemmed	Enhancing Generic Reaction Yield Prediction through Reaction Condition-Based Contrastive Learning
title_short	Enhancing Generic Reaction Yield Prediction through Reaction Condition-Based Contrastive Learning
title_sort	enhancing generic reaction yield prediction through reaction condition based contrastive learning
url	https://spj.science.org/doi/10.34133/research.0292
work_keys_str_mv	AT xiaodanyin enhancinggenericreactionyieldpredictionthroughreactionconditionbasedcontrastivelearning AT changyuhsieh enhancinggenericreactionyieldpredictionthroughreactionconditionbasedcontrastivelearning AT xiaoruiwang enhancinggenericreactionyieldpredictionthroughreactionconditionbasedcontrastivelearning AT zhenxingwu enhancinggenericreactionyieldpredictionthroughreactionconditionbasedcontrastivelearning AT qingye enhancinggenericreactionyieldpredictionthroughreactionconditionbasedcontrastivelearning AT hongleibao enhancinggenericreactionyieldpredictionthroughreactionconditionbasedcontrastivelearning AT yafengdeng enhancinggenericreactionyieldpredictionthroughreactionconditionbasedcontrastivelearning AT hongmingchen enhancinggenericreactionyieldpredictionthroughreactionconditionbasedcontrastivelearning AT peiluo enhancinggenericreactionyieldpredictionthroughreactionconditionbasedcontrastivelearning AT huanxiangliu enhancinggenericreactionyieldpredictionthroughreactionconditionbasedcontrastivelearning AT tingjunhou enhancinggenericreactionyieldpredictionthroughreactionconditionbasedcontrastivelearning AT xiaojunyao enhancinggenericreactionyieldpredictionthroughreactionconditionbasedcontrastivelearning

Enhancing Generic Reaction Yield Prediction through Reaction Condition-Based Contrastive Learning

Similar Items