Iterative experimental design based on active machine learning reduces the experimental burden associated with reaction screening
High-throughput reaction screening has emerged as a useful means of rapidly identifying the influence of key reaction variables on reaction outcomes. We show that active machine learning can further this objective by eliminating dependence on "exhaustive"screens (screens in which all possi...
Main Authors: | , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
Royal Society of Chemistry (RSC)
2021
|
Online Access: | https://hdl.handle.net/1721.1/129381 |
_version_ | 1826197862048858112 |
---|---|
author | Eyke, Natalie S. Green Jr, William H Jensen, Klavs F |
author2 | Massachusetts Institute of Technology. Department of Chemical Engineering |
author_facet | Massachusetts Institute of Technology. Department of Chemical Engineering Eyke, Natalie S. Green Jr, William H Jensen, Klavs F |
author_sort | Eyke, Natalie S. |
collection | MIT |
description | High-throughput reaction screening has emerged as a useful means of rapidly identifying the influence of key reaction variables on reaction outcomes. We show that active machine learning can further this objective by eliminating dependence on "exhaustive"screens (screens in which all possible combinations of the reaction variables of interest are examined). This is achieved through iterative selection of maximally informative experiments from the subset of all possible experiments in the domain. These experiments can be used to train accurate machine learning models that can be used to predict the outcomes of reactions that were not performed, thus reducing the overall experimental burden. To demonstrate our approach, we conduct retrospective analyses of the preexisting results of high-throughput reaction screening experiments. We compare the test set errors of models trained on actively-selected reactions to models trained on reactions selected at random from the same domain. We find that the degree to which models trained on actively-selected data outperform models trained on randomly-selected data depends on the domain being modeled, with it being possible to achieve very low test set errors when the dataset is heavily skewed in favor of low- or zero-yielding reactions. Our results confirm that this algorithm is a useful experiment planning tool that can change the reaction screening paradigm, by allowing medicinal and process chemists to focus their reaction screening efforts on the generation of a small amount of high-quality data. |
first_indexed | 2024-09-23T10:54:38Z |
format | Article |
id | mit-1721.1/129381 |
institution | Massachusetts Institute of Technology |
language | English |
last_indexed | 2024-09-23T10:54:38Z |
publishDate | 2021 |
publisher | Royal Society of Chemistry (RSC) |
record_format | dspace |
spelling | mit-1721.1/1293812022-09-30T23:51:41Z Iterative experimental design based on active machine learning reduces the experimental burden associated with reaction screening Eyke, Natalie S. Green Jr, William H Jensen, Klavs F Massachusetts Institute of Technology. Department of Chemical Engineering High-throughput reaction screening has emerged as a useful means of rapidly identifying the influence of key reaction variables on reaction outcomes. We show that active machine learning can further this objective by eliminating dependence on "exhaustive"screens (screens in which all possible combinations of the reaction variables of interest are examined). This is achieved through iterative selection of maximally informative experiments from the subset of all possible experiments in the domain. These experiments can be used to train accurate machine learning models that can be used to predict the outcomes of reactions that were not performed, thus reducing the overall experimental burden. To demonstrate our approach, we conduct retrospective analyses of the preexisting results of high-throughput reaction screening experiments. We compare the test set errors of models trained on actively-selected reactions to models trained on reactions selected at random from the same domain. We find that the degree to which models trained on actively-selected data outperform models trained on randomly-selected data depends on the domain being modeled, with it being possible to achieve very low test set errors when the dataset is heavily skewed in favor of low- or zero-yielding reactions. Our results confirm that this algorithm is a useful experiment planning tool that can change the reaction screening paradigm, by allowing medicinal and process chemists to focus their reaction screening efforts on the generation of a small amount of high-quality data. 2021-01-12T15:37:34Z 2021-01-12T15:37:34Z 2020-08 2020-06 2020-12-21T15:14:29Z Article http://purl.org/eprint/type/JournalArticle 2058-9883 https://hdl.handle.net/1721.1/129381 Eyke, Natalie S. et al. “Iterative experimental design based on active machine learning reduces the experimental burden associated with reaction screening.” Reaction Chemistry and Engineering 5, 10 (August 2020): 1963–1972 © 2020 The Author(s) en 10.1039/D0RE00232A Reaction Chemistry and Engineering Creative Commons Attribution Noncommercial 3.0 unported license https://creativecommons.org/licenses/by-nc/3.0/ application/pdf Royal Society of Chemistry (RSC) Royal Society of Chemistry (RSC) |
spellingShingle | Eyke, Natalie S. Green Jr, William H Jensen, Klavs F Iterative experimental design based on active machine learning reduces the experimental burden associated with reaction screening |
title | Iterative experimental design based on active machine learning reduces the experimental burden associated with reaction screening |
title_full | Iterative experimental design based on active machine learning reduces the experimental burden associated with reaction screening |
title_fullStr | Iterative experimental design based on active machine learning reduces the experimental burden associated with reaction screening |
title_full_unstemmed | Iterative experimental design based on active machine learning reduces the experimental burden associated with reaction screening |
title_short | Iterative experimental design based on active machine learning reduces the experimental burden associated with reaction screening |
title_sort | iterative experimental design based on active machine learning reduces the experimental burden associated with reaction screening |
url | https://hdl.handle.net/1721.1/129381 |
work_keys_str_mv | AT eykenatalies iterativeexperimentaldesignbasedonactivemachinelearningreducestheexperimentalburdenassociatedwithreactionscreening AT greenjrwilliamh iterativeexperimentaldesignbasedonactivemachinelearningreducestheexperimentalburdenassociatedwithreactionscreening AT jensenklavsf iterativeexperimentaldesignbasedonactivemachinelearningreducestheexperimentalburdenassociatedwithreactionscreening |