Improving the performance of models for one-step retrosynthesis through re-ranking

Abstract Retrosynthesis is at the core of organic chemistry. Recently, the rapid growth of artificial intelligence (AI) has spurred a variety of novel machine learning approaches for data-driven synthesis planning. These methods learn complex patterns from reaction databases in orde...

Full description

Bibliographic Details
Main Authors:	Lin, Min H., Tu, Zhengkai, Coley, Connor W.
Other Authors:	Massachusetts Institute of Technology. Department of Chemical Engineering
Format:	Article
Language:	English
Published:	Springer International Publishing 2022
Online Access:	https://hdl.handle.net/1721.1/141316

_version_	1826210958034337792
author	Lin, Min H. Tu, Zhengkai Coley, Connor W.
author2	Massachusetts Institute of Technology. Department of Chemical Engineering
author_facet	Massachusetts Institute of Technology. Department of Chemical Engineering Lin, Min H. Tu, Zhengkai Coley, Connor W.
author_sort	Lin, Min H.
collection	MIT
description	Abstract Retrosynthesis is at the core of organic chemistry. Recently, the rapid growth of artificial intelligence (AI) has spurred a variety of novel machine learning approaches for data-driven synthesis planning. These methods learn complex patterns from reaction databases in order to predict, for a given product, sets of reactants that can be used to synthesise that product. However, their performance as measured by the top-N accuracy in matching published reaction precedents still leaves room for improvement. This work aims to enhance these models by learning to re-rank their reactant predictions. Specifically, we design and train an energy-based model to re-rank, for each product, the published reaction as the top suggestion and the remaining reactant predictions as lower-ranked. We show that re-ranking can improve one-step models significantly using the standard USPTO-50k benchmark dataset, such as RetroSim, a similarity-based method, from 35.7 to 51.8% top-1 accuracy and NeuralSym, a deep learning method, from 45.7 to 51.3%, and also that re-ranking the union of two models’ suggestions can lead to better performance than either alone. However, the state-of-the-art top-1 accuracy is not improved by this method. Graphical Abstract
first_indexed	2024-09-23T14:58:09Z
format	Article
id	mit-1721.1/141316
institution	Massachusetts Institute of Technology
language	English
last_indexed	2024-09-23T14:58:09Z
publishDate	2022
publisher	Springer International Publishing
record_format	dspace
spelling	mit-1721.1/1413162023-02-09T20:12:40Z Improving the performance of models for one-step retrosynthesis through re-ranking Lin, Min H. Tu, Zhengkai Coley, Connor W. Massachusetts Institute of Technology. Department of Chemical Engineering Abstract Retrosynthesis is at the core of organic chemistry. Recently, the rapid growth of artificial intelligence (AI) has spurred a variety of novel machine learning approaches for data-driven synthesis planning. These methods learn complex patterns from reaction databases in order to predict, for a given product, sets of reactants that can be used to synthesise that product. However, their performance as measured by the top-N accuracy in matching published reaction precedents still leaves room for improvement. This work aims to enhance these models by learning to re-rank their reactant predictions. Specifically, we design and train an energy-based model to re-rank, for each product, the published reaction as the top suggestion and the remaining reactant predictions as lower-ranked. We show that re-ranking can improve one-step models significantly using the standard USPTO-50k benchmark dataset, such as RetroSim, a similarity-based method, from 35.7 to 51.8% top-1 accuracy and NeuralSym, a deep learning method, from 45.7 to 51.3%, and also that re-ranking the union of two models’ suggestions can lead to better performance than either alone. However, the state-of-the-art top-1 accuracy is not improved by this method. Graphical Abstract 2022-03-21T12:56:08Z 2022-03-21T12:56:08Z 2022-03-15 2022-03-20T04:15:26Z Article http://purl.org/eprint/type/JournalArticle https://hdl.handle.net/1721.1/141316 Journal of Cheminformatics. 2022 Mar 15;14(1):15 PUBLISHER_CC en https://doi.org/10.1186/s13321-022-00594-8 Creative Commons Attribution https://creativecommons.org/licenses/by/4.0/ The Author(s) application/pdf Springer International Publishing Springer International Publishing
spellingShingle	Lin, Min H. Tu, Zhengkai Coley, Connor W. Improving the performance of models for one-step retrosynthesis through re-ranking
title	Improving the performance of models for one-step retrosynthesis through re-ranking
title_full	Improving the performance of models for one-step retrosynthesis through re-ranking
title_fullStr	Improving the performance of models for one-step retrosynthesis through re-ranking
title_full_unstemmed	Improving the performance of models for one-step retrosynthesis through re-ranking
title_short	Improving the performance of models for one-step retrosynthesis through re-ranking
title_sort	improving the performance of models for one step retrosynthesis through re ranking
url	https://hdl.handle.net/1721.1/141316
work_keys_str_mv	AT linminh improvingtheperformanceofmodelsforonestepretrosynthesisthroughreranking AT tuzhengkai improvingtheperformanceofmodelsforonestepretrosynthesisthroughreranking AT coleyconnorw improvingtheperformanceofmodelsforonestepretrosynthesisthroughreranking

Improving the performance of models for one-step retrosynthesis through re-ranking

Similar Items