Iterative experimental design based on active machine learning reduces the experimental burden associated with reaction screening

High-throughput reaction screening has emerged as a useful means of rapidly identifying the influence of key reaction variables on reaction outcomes. We show that active machine learning can further this objective by eliminating dependence on "exhaustive"screens (screens in which all possi...

Full description

Bibliographic Details
Main Authors:	Eyke, Natalie S., Green Jr, William H, Jensen, Klavs F
Other Authors:	Massachusetts Institute of Technology. Department of Chemical Engineering
Format:	Article
Language:	English
Published:	Royal Society of Chemistry (RSC) 2021
Online Access:	https://hdl.handle.net/1721.1/129381

_version_	1826197862048858112
author	Eyke, Natalie S. Green Jr, William H Jensen, Klavs F
author2	Massachusetts Institute of Technology. Department of Chemical Engineering
author_facet	Massachusetts Institute of Technology. Department of Chemical Engineering Eyke, Natalie S. Green Jr, William H Jensen, Klavs F
author_sort	Eyke, Natalie S.
collection	MIT
description	High-throughput reaction screening has emerged as a useful means of rapidly identifying the influence of key reaction variables on reaction outcomes. We show that active machine learning can further this objective by eliminating dependence on "exhaustive"screens (screens in which all possible combinations of the reaction variables of interest are examined). This is achieved through iterative selection of maximally informative experiments from the subset of all possible experiments in the domain. These experiments can be used to train accurate machine learning models that can be used to predict the outcomes of reactions that were not performed, thus reducing the overall experimental burden. To demonstrate our approach, we conduct retrospective analyses of the preexisting results of high-throughput reaction screening experiments. We compare the test set errors of models trained on actively-selected reactions to models trained on reactions selected at random from the same domain. We find that the degree to which models trained on actively-selected data outperform models trained on randomly-selected data depends on the domain being modeled, with it being possible to achieve very low test set errors when the dataset is heavily skewed in favor of low- or zero-yielding reactions. Our results confirm that this algorithm is a useful experiment planning tool that can change the reaction screening paradigm, by allowing medicinal and process chemists to focus their reaction screening efforts on the generation of a small amount of high-quality data.
first_indexed	2024-09-23T10:54:38Z
format	Article
id	mit-1721.1/129381
institution	Massachusetts Institute of Technology
language	English
last_indexed	2024-09-23T10:54:38Z
publishDate	2021
publisher	Royal Society of Chemistry (RSC)
record_format	dspace
spelling	mit-1721.1/1293812022-09-30T23:51:41Z Iterative experimental design based on active machine learning reduces the experimental burden associated with reaction screening Eyke, Natalie S. Green Jr, William H Jensen, Klavs F Massachusetts Institute of Technology. Department of Chemical Engineering High-throughput reaction screening has emerged as a useful means of rapidly identifying the influence of key reaction variables on reaction outcomes. We show that active machine learning can further this objective by eliminating dependence on "exhaustive"screens (screens in which all possible combinations of the reaction variables of interest are examined). This is achieved through iterative selection of maximally informative experiments from the subset of all possible experiments in the domain. These experiments can be used to train accurate machine learning models that can be used to predict the outcomes of reactions that were not performed, thus reducing the overall experimental burden. To demonstrate our approach, we conduct retrospective analyses of the preexisting results of high-throughput reaction screening experiments. We compare the test set errors of models trained on actively-selected reactions to models trained on reactions selected at random from the same domain. We find that the degree to which models trained on actively-selected data outperform models trained on randomly-selected data depends on the domain being modeled, with it being possible to achieve very low test set errors when the dataset is heavily skewed in favor of low- or zero-yielding reactions. Our results confirm that this algorithm is a useful experiment planning tool that can change the reaction screening paradigm, by allowing medicinal and process chemists to focus their reaction screening efforts on the generation of a small amount of high-quality data. 2021-01-12T15:37:34Z 2021-01-12T15:37:34Z 2020-08 2020-06 2020-12-21T15:14:29Z Article http://purl.org/eprint/type/JournalArticle 2058-9883 https://hdl.handle.net/1721.1/129381 Eyke, Natalie S. et al. “Iterative experimental design based on active machine learning reduces the experimental burden associated with reaction screening.” Reaction Chemistry and Engineering 5, 10 (August 2020): 1963–1972 © 2020 The Author(s) en 10.1039/D0RE00232A Reaction Chemistry and Engineering Creative Commons Attribution Noncommercial 3.0 unported license https://creativecommons.org/licenses/by-nc/3.0/ application/pdf Royal Society of Chemistry (RSC) Royal Society of Chemistry (RSC)
spellingShingle	Eyke, Natalie S. Green Jr, William H Jensen, Klavs F Iterative experimental design based on active machine learning reduces the experimental burden associated with reaction screening
title	Iterative experimental design based on active machine learning reduces the experimental burden associated with reaction screening
title_full	Iterative experimental design based on active machine learning reduces the experimental burden associated with reaction screening
title_fullStr	Iterative experimental design based on active machine learning reduces the experimental burden associated with reaction screening
title_full_unstemmed	Iterative experimental design based on active machine learning reduces the experimental burden associated with reaction screening
title_short	Iterative experimental design based on active machine learning reduces the experimental burden associated with reaction screening
title_sort	iterative experimental design based on active machine learning reduces the experimental burden associated with reaction screening
url	https://hdl.handle.net/1721.1/129381
work_keys_str_mv	AT eykenatalies iterativeexperimentaldesignbasedonactivemachinelearningreducestheexperimentalburdenassociatedwithreactionscreening AT greenjrwilliamh iterativeexperimentaldesignbasedonactivemachinelearningreducestheexperimentalburdenassociatedwithreactionscreening AT jensenklavsf iterativeexperimentaldesignbasedonactivemachinelearningreducestheexperimentalburdenassociatedwithreactionscreening

Iterative experimental design based on active machine learning reduces the experimental burden associated with reaction screening

Similar Items