Data-Driven Approaches Can Overcome the Cost–Accuracy Trade-Off in Multireference Diagnostics

Copyright © 2020 American Chemical Society. High-throughput computational screening typically employs methods (i.e., density functional theory or DFT) that can fail to describe challenging molecules, such as those with strongly correlated electronic structure. In such cases, multireference (MR) corr...

Full description

Bibliographic Details
Main Authors: Duan, Chenru, Liu, Fang, Nandy, Aditya, Kulik, Heather J
Other Authors: Massachusetts Institute of Technology. Department of Chemical Engineering
Format: Article
Language:English
Published: American Chemical Society (ACS) 2021
Online Access:https://hdl.handle.net/1721.1/134401
_version_ 1826215186016501760
author Duan, Chenru
Liu, Fang
Nandy, Aditya
Kulik, Heather J
author2 Massachusetts Institute of Technology. Department of Chemical Engineering
author_facet Massachusetts Institute of Technology. Department of Chemical Engineering
Duan, Chenru
Liu, Fang
Nandy, Aditya
Kulik, Heather J
author_sort Duan, Chenru
collection MIT
description Copyright © 2020 American Chemical Society. High-throughput computational screening typically employs methods (i.e., density functional theory or DFT) that can fail to describe challenging molecules, such as those with strongly correlated electronic structure. In such cases, multireference (MR) correlated wavefunction theory (WFT) would be the appropriate choice but remains more challenging to carry out and automate than single-reference (SR) WFT or DFT. Numerous diagnostics have been proposed for identifying when MR character is likely to have an effect on the predictive power of SR calculations, but conflicting conclusions about diagnostic performance have been reached on small data sets. We compute 15 MR diagnostics, ranging from affordable DFT-based to more costly MR-WFT-based diagnostics, on a set of 3165 equilibrium and distorted small organic molecules containing up to six heavy atoms. Conflicting MR character assignments and low pairwise linear correlations among diagnostics are also observed over this set. We evaluate the ability of existing diagnostics to predict the percent recovery of the correlation energy, %Ecorr. None of the DFT-based diagnostics are nearly as predictive of %Ecorr as the best WFT-based diagnostics. To overcome the limitation of this cost-accuracy trade-off, we develop machine learning (ML, i.e., kernel ridge regression) models to predict WFT-based diagnostics from a combination of DFT-based diagnostics and a new, size-independent 3D geometric representation. The ML-predicted diagnostics correlate as well with MR effects as their computed (i.e., with WFT) values, significantly improving over the DFT-based diagnostics on which the models were trained. These ML models thus provide a promising approach to improve upon DFT-based diagnostic accuracy while remaining suitably low cost for high-throughput screening.
first_indexed 2024-09-23T16:18:11Z
format Article
id mit-1721.1/134401
institution Massachusetts Institute of Technology
language English
last_indexed 2024-09-23T16:18:11Z
publishDate 2021
publisher American Chemical Society (ACS)
record_format dspace
spelling mit-1721.1/1344012023-01-11T19:01:16Z Data-Driven Approaches Can Overcome the Cost–Accuracy Trade-Off in Multireference Diagnostics Duan, Chenru Liu, Fang Nandy, Aditya Kulik, Heather J Massachusetts Institute of Technology. Department of Chemical Engineering Massachusetts Institute of Technology. Department of Chemistry Copyright © 2020 American Chemical Society. High-throughput computational screening typically employs methods (i.e., density functional theory or DFT) that can fail to describe challenging molecules, such as those with strongly correlated electronic structure. In such cases, multireference (MR) correlated wavefunction theory (WFT) would be the appropriate choice but remains more challenging to carry out and automate than single-reference (SR) WFT or DFT. Numerous diagnostics have been proposed for identifying when MR character is likely to have an effect on the predictive power of SR calculations, but conflicting conclusions about diagnostic performance have been reached on small data sets. We compute 15 MR diagnostics, ranging from affordable DFT-based to more costly MR-WFT-based diagnostics, on a set of 3165 equilibrium and distorted small organic molecules containing up to six heavy atoms. Conflicting MR character assignments and low pairwise linear correlations among diagnostics are also observed over this set. We evaluate the ability of existing diagnostics to predict the percent recovery of the correlation energy, %Ecorr. None of the DFT-based diagnostics are nearly as predictive of %Ecorr as the best WFT-based diagnostics. To overcome the limitation of this cost-accuracy trade-off, we develop machine learning (ML, i.e., kernel ridge regression) models to predict WFT-based diagnostics from a combination of DFT-based diagnostics and a new, size-independent 3D geometric representation. The ML-predicted diagnostics correlate as well with MR effects as their computed (i.e., with WFT) values, significantly improving over the DFT-based diagnostics on which the models were trained. These ML models thus provide a promising approach to improve upon DFT-based diagnostic accuracy while remaining suitably low cost for high-throughput screening. 2021-10-27T20:04:50Z 2021-10-27T20:04:50Z 2020 2021-06-11T18:17:10Z Article http://purl.org/eprint/type/JournalArticle https://hdl.handle.net/1721.1/134401 en 10.1021/ACS.JCTC.0C00358 Journal of Chemical Theory and Computation Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/ application/pdf American Chemical Society (ACS) Other repository
spellingShingle Duan, Chenru
Liu, Fang
Nandy, Aditya
Kulik, Heather J
Data-Driven Approaches Can Overcome the Cost–Accuracy Trade-Off in Multireference Diagnostics
title Data-Driven Approaches Can Overcome the Cost–Accuracy Trade-Off in Multireference Diagnostics
title_full Data-Driven Approaches Can Overcome the Cost–Accuracy Trade-Off in Multireference Diagnostics
title_fullStr Data-Driven Approaches Can Overcome the Cost–Accuracy Trade-Off in Multireference Diagnostics
title_full_unstemmed Data-Driven Approaches Can Overcome the Cost–Accuracy Trade-Off in Multireference Diagnostics
title_short Data-Driven Approaches Can Overcome the Cost–Accuracy Trade-Off in Multireference Diagnostics
title_sort data driven approaches can overcome the cost accuracy trade off in multireference diagnostics
url https://hdl.handle.net/1721.1/134401
work_keys_str_mv AT duanchenru datadrivenapproachescanovercomethecostaccuracytradeoffinmultireferencediagnostics
AT liufang datadrivenapproachescanovercomethecostaccuracytradeoffinmultireferencediagnostics
AT nandyaditya datadrivenapproachescanovercomethecostaccuracytradeoffinmultireferencediagnostics
AT kulikheatherj datadrivenapproachescanovercomethecostaccuracytradeoffinmultireferencediagnostics