Regio-selectivity prediction with a machine-learned reaction representation and on-the-fly quantum mechanical descriptors

© The Royal Society of Chemistry 2021. Accurate and rapid evaluation of whether substrates can undergo the desired the transformation is crucial and challenging for both human knowledge and computer predictions. Despite the potential of machine learning in predicting chemical reactivity such as sele...

Full description

Bibliographic Details
Main Authors: guan, yanfei, Coley, Connor W, Wu, Haoyang, Duminda, Ranasinghe, Heid, Esther, Struble, Thomas James, Pattanaik, Lagnajit, Green, William H, Jensen, Klavs F
Format: Article
Language:English
Published: Royal Society of Chemistry (RSC) 2021
Online Access:https://hdl.handle.net/1721.1/133475
_version_ 1811072912610820096
author guan, yanfei
Coley, Connor W
Wu, Haoyang
Duminda, Ranasinghe
Heid, Esther
Struble, Thomas James
Pattanaik, Lagnajit
Green, William H
Jensen, Klavs F
author_facet guan, yanfei
Coley, Connor W
Wu, Haoyang
Duminda, Ranasinghe
Heid, Esther
Struble, Thomas James
Pattanaik, Lagnajit
Green, William H
Jensen, Klavs F
author_sort guan, yanfei
collection MIT
description © The Royal Society of Chemistry 2021. Accurate and rapid evaluation of whether substrates can undergo the desired the transformation is crucial and challenging for both human knowledge and computer predictions. Despite the potential of machine learning in predicting chemical reactivity such as selectivity, popular feature engineering and learning methods are either time-consuming or data-hungry. We introduce a new method that combines machine-learned reaction representation with selected quantum mechanical descriptors to predict regio-selectivity in general substitution reactions. We construct a reactivity descriptor database based onab initiocalculations of 130k organic molecules, and train a multi-task constrained model to calculate demanded descriptors on-the-fly. The proposed platform enhances the inter/extra-polated performance for regio-selectivity predictions and enables learning from small datasets with just hundreds of examples. Furthermore, the proposed protocol is demonstrated to be generally applicable to a diverse range of chemical spaces. For three general types of substitution reactions (aromatic C-H functionalization, aromatic C-X substitution, and other substitution reactions) curated from a commercial database, the fusion model achieves 89.7%, 96.7%, and 97.2% top-1 accuracy in predicting the major outcome, respectively, each using 5000 training reactions. Using predicted descriptors, the fusion model is end-to-end, and requires approximately only 70 ms per reaction to predict the selectivity from reaction SMILES strings.
first_indexed 2024-09-23T09:19:48Z
format Article
id mit-1721.1/133475
institution Massachusetts Institute of Technology
language English
last_indexed 2024-09-23T09:19:48Z
publishDate 2021
publisher Royal Society of Chemistry (RSC)
record_format dspace
spelling mit-1721.1/1334752021-10-28T04:45:58Z Regio-selectivity prediction with a machine-learned reaction representation and on-the-fly quantum mechanical descriptors guan, yanfei Coley, Connor W Wu, Haoyang Duminda, Ranasinghe Heid, Esther Struble, Thomas James Pattanaik, Lagnajit Green, William H Jensen, Klavs F © The Royal Society of Chemistry 2021. Accurate and rapid evaluation of whether substrates can undergo the desired the transformation is crucial and challenging for both human knowledge and computer predictions. Despite the potential of machine learning in predicting chemical reactivity such as selectivity, popular feature engineering and learning methods are either time-consuming or data-hungry. We introduce a new method that combines machine-learned reaction representation with selected quantum mechanical descriptors to predict regio-selectivity in general substitution reactions. We construct a reactivity descriptor database based onab initiocalculations of 130k organic molecules, and train a multi-task constrained model to calculate demanded descriptors on-the-fly. The proposed platform enhances the inter/extra-polated performance for regio-selectivity predictions and enables learning from small datasets with just hundreds of examples. Furthermore, the proposed protocol is demonstrated to be generally applicable to a diverse range of chemical spaces. For three general types of substitution reactions (aromatic C-H functionalization, aromatic C-X substitution, and other substitution reactions) curated from a commercial database, the fusion model achieves 89.7%, 96.7%, and 97.2% top-1 accuracy in predicting the major outcome, respectively, each using 5000 training reactions. Using predicted descriptors, the fusion model is end-to-end, and requires approximately only 70 ms per reaction to predict the selectivity from reaction SMILES strings. 2021-10-27T19:53:02Z 2021-10-27T19:53:02Z 2021 2021-06-09T16:42:56Z Article http://purl.org/eprint/type/JournalArticle https://hdl.handle.net/1721.1/133475 en 10.1039/d0sc04823b Chemical Science Creative Commons Attribution 3.0 unported license https://creativecommons.org/licenses/by/3.0/ application/pdf Royal Society of Chemistry (RSC) Royal Society of Chemistry (RSC)
spellingShingle guan, yanfei
Coley, Connor W
Wu, Haoyang
Duminda, Ranasinghe
Heid, Esther
Struble, Thomas James
Pattanaik, Lagnajit
Green, William H
Jensen, Klavs F
Regio-selectivity prediction with a machine-learned reaction representation and on-the-fly quantum mechanical descriptors
title Regio-selectivity prediction with a machine-learned reaction representation and on-the-fly quantum mechanical descriptors
title_full Regio-selectivity prediction with a machine-learned reaction representation and on-the-fly quantum mechanical descriptors
title_fullStr Regio-selectivity prediction with a machine-learned reaction representation and on-the-fly quantum mechanical descriptors
title_full_unstemmed Regio-selectivity prediction with a machine-learned reaction representation and on-the-fly quantum mechanical descriptors
title_short Regio-selectivity prediction with a machine-learned reaction representation and on-the-fly quantum mechanical descriptors
title_sort regio selectivity prediction with a machine learned reaction representation and on the fly quantum mechanical descriptors
url https://hdl.handle.net/1721.1/133475
work_keys_str_mv AT guanyanfei regioselectivitypredictionwithamachinelearnedreactionrepresentationandontheflyquantummechanicaldescriptors
AT coleyconnorw regioselectivitypredictionwithamachinelearnedreactionrepresentationandontheflyquantummechanicaldescriptors
AT wuhaoyang regioselectivitypredictionwithamachinelearnedreactionrepresentationandontheflyquantummechanicaldescriptors
AT dumindaranasinghe regioselectivitypredictionwithamachinelearnedreactionrepresentationandontheflyquantummechanicaldescriptors
AT heidesther regioselectivitypredictionwithamachinelearnedreactionrepresentationandontheflyquantummechanicaldescriptors
AT strublethomasjames regioselectivitypredictionwithamachinelearnedreactionrepresentationandontheflyquantummechanicaldescriptors
AT pattanaiklagnajit regioselectivitypredictionwithamachinelearnedreactionrepresentationandontheflyquantummechanicaldescriptors
AT greenwilliamh regioselectivitypredictionwithamachinelearnedreactionrepresentationandontheflyquantummechanicaldescriptors
AT jensenklavsf regioselectivitypredictionwithamachinelearnedreactionrepresentationandontheflyquantummechanicaldescriptors