Quantum Chemistry Meets Machine Learning: Autonomous Computational Workflow for Chemical Discovery

Automation has long been revolutionizing our modern society since the first industrial revolution and has the potential to provide sufficient productivity forces for revolution is ongoing in computational sciences. Quantum chemistry software and modern computers have developed to a stage where virtu...

Full description

Bibliographic Details
Main Author: Duan, Chenru
Other Authors: Kulik, Heather J.
Format: Thesis
Published: Massachusetts Institute of Technology 2023
Online Access:https://hdl.handle.net/1721.1/147418
Description
Summary:Automation has long been revolutionizing our modern society since the first industrial revolution and has the potential to provide sufficient productivity forces for revolution is ongoing in computational sciences. Quantum chemistry software and modern computers have developed to a stage where virtual high throughput screening (VHTS), i.e., running thousands of calculations in parallel, becomes possible. This provides great opportunities for developing automated workflows to utilize the increasing computing power to generate large-scale data sets. Together with machine learning (ML) models trained on these data sets as either surrogate function approximations or generative models, accelerated chemical discovery for functional molecules and materials are achieved. Current automation workflows, however, are far from perfect. Namely, they produce too many unfruitful results and suffer severely from method selection bias, especially on challenging chemical spaces such as transition metal chemistry. These problems limit the automated workflows from providing common prosperity. Similar efficiency and accuracy needed for chemical discovery. In this Thesis, we introduce intelligent ML-based decision-making models in automation workflows. We build the first set of classifiers to predict the likelihood of calculation success that on-the-fly monitors and terminates an already running calculations if they are predicted to fail with high confidence. These classifiers are extremely transferable and stays accurate (i.e.,>95%) during the whole geometry optimization process, saving >1/2 of the computation resources. We develope the first semi-supervised learning classifier to identify strong static correlation in a system, achieving state-of-the-art performance for this task. Therefore, we can pre-determine which systems require more expensive (yet more accurate) correlated wavefunction theory calculations, thus improving overall data accuracy without adding unnecessary computational cost. We also proposed an approach that utilizes the consensus among multiple density functional approximations (DFAs) to discover robust (i.e., DFA- insensitive) candidate compounds, which are in much better agreement with experimentally observed leads. Lastly, we built a DFA recommender that selects the DFA with the lowest expected error to the reference in a system-dependent manner, achieving the accuracy needed for inorganic chemical discovery. All these ML-based decision-making models are integrated in workflows for VHTS. We anticipate these “smart” computational workflows are keys to autonomous chemical discovery.