A Machine Learning Force Field for Bio-Macromolecular Modeling Based on Quantum Chemistry-Calculated Interaction Energy Datasets

Accurate energy data from noncovalent interactions are essential for constructing force fields for molecular dynamics simulations of bio-macromolecular systems. There are two important practical issues in the construction of a reliable force field with the hope of balancing the desired chemical accu...

Full description

Bibliographic Details
Main Authors: Zhen-Xuan Fan, Sheng D. Chao
Format: Article
Language:English
Published: MDPI AG 2024-01-01
Series:Bioengineering
Subjects:
Online Access:https://www.mdpi.com/2306-5354/11/1/51
_version_ 1827372706877669376
author Zhen-Xuan Fan
Sheng D. Chao
author_facet Zhen-Xuan Fan
Sheng D. Chao
author_sort Zhen-Xuan Fan
collection DOAJ
description Accurate energy data from noncovalent interactions are essential for constructing force fields for molecular dynamics simulations of bio-macromolecular systems. There are two important practical issues in the construction of a reliable force field with the hope of balancing the desired chemical accuracy and working efficiency. One is to determine a suitable quantum chemistry level of theory for calculating interaction energies. The other is to use a suitable continuous energy function to model the quantum chemical energy data. For the first issue, we have recently calculated the intermolecular interaction energies using the SAPT0 level of theory, and we have systematically organized these energies into the ab initio SOFG-31 (homodimer) and SOFG-31-heterodimer datasets. In this work, we re-calculate these interaction energies by using the more advanced SAPT2 level of theory with a wider series of basis sets. Our purpose is to determine the SAPT level of theory proper for interaction energies with respect to the CCSD(T)/CBS benchmark chemical accuracy. Next, to utilize these energy datasets, we employ one of the well-developed machine learning techniques, called the CLIFF scheme, to construct a general-purpose force field for biomolecular dynamics simulations. Here we use the SOFG-31 dataset and the SOFG-31-heterodimer dataset as the training and test sets, respectively. Our results demonstrate that using the CLIFF scheme can reproduce a diverse range of dimeric interaction energy patterns with only a small training set. The overall errors for each SAPT energy component, as well as the SAPT total energy, are all well below the desired chemical accuracy of ~1 kcal/mol.
first_indexed 2024-03-08T11:04:56Z
format Article
id doaj.art-00de13b1587e4363ab5a2d7c44abbffb
institution Directory Open Access Journal
issn 2306-5354
language English
last_indexed 2024-03-08T11:04:56Z
publishDate 2024-01-01
publisher MDPI AG
record_format Article
series Bioengineering
spelling doaj.art-00de13b1587e4363ab5a2d7c44abbffb2024-01-26T15:06:17ZengMDPI AGBioengineering2306-53542024-01-011115110.3390/bioengineering11010051A Machine Learning Force Field for Bio-Macromolecular Modeling Based on Quantum Chemistry-Calculated Interaction Energy DatasetsZhen-Xuan Fan0Sheng D. Chao1Institute of Applied Mechanics, National Taiwan University, Taipei 106, TaiwanInstitute of Applied Mechanics, National Taiwan University, Taipei 106, TaiwanAccurate energy data from noncovalent interactions are essential for constructing force fields for molecular dynamics simulations of bio-macromolecular systems. There are two important practical issues in the construction of a reliable force field with the hope of balancing the desired chemical accuracy and working efficiency. One is to determine a suitable quantum chemistry level of theory for calculating interaction energies. The other is to use a suitable continuous energy function to model the quantum chemical energy data. For the first issue, we have recently calculated the intermolecular interaction energies using the SAPT0 level of theory, and we have systematically organized these energies into the ab initio SOFG-31 (homodimer) and SOFG-31-heterodimer datasets. In this work, we re-calculate these interaction energies by using the more advanced SAPT2 level of theory with a wider series of basis sets. Our purpose is to determine the SAPT level of theory proper for interaction energies with respect to the CCSD(T)/CBS benchmark chemical accuracy. Next, to utilize these energy datasets, we employ one of the well-developed machine learning techniques, called the CLIFF scheme, to construct a general-purpose force field for biomolecular dynamics simulations. Here we use the SOFG-31 dataset and the SOFG-31-heterodimer dataset as the training and test sets, respectively. Our results demonstrate that using the CLIFF scheme can reproduce a diverse range of dimeric interaction energy patterns with only a small training set. The overall errors for each SAPT energy component, as well as the SAPT total energy, are all well below the desired chemical accuracy of ~1 kcal/mol.https://www.mdpi.com/2306-5354/11/1/51noncovalent interactionsmachine learning force fieldssymmetry-adapted perturbation theoryab initio energy datasetsartificial intelligence
spellingShingle Zhen-Xuan Fan
Sheng D. Chao
A Machine Learning Force Field for Bio-Macromolecular Modeling Based on Quantum Chemistry-Calculated Interaction Energy Datasets
Bioengineering
noncovalent interactions
machine learning force fields
symmetry-adapted perturbation theory
ab initio energy datasets
artificial intelligence
title A Machine Learning Force Field for Bio-Macromolecular Modeling Based on Quantum Chemistry-Calculated Interaction Energy Datasets
title_full A Machine Learning Force Field for Bio-Macromolecular Modeling Based on Quantum Chemistry-Calculated Interaction Energy Datasets
title_fullStr A Machine Learning Force Field for Bio-Macromolecular Modeling Based on Quantum Chemistry-Calculated Interaction Energy Datasets
title_full_unstemmed A Machine Learning Force Field for Bio-Macromolecular Modeling Based on Quantum Chemistry-Calculated Interaction Energy Datasets
title_short A Machine Learning Force Field for Bio-Macromolecular Modeling Based on Quantum Chemistry-Calculated Interaction Energy Datasets
title_sort machine learning force field for bio macromolecular modeling based on quantum chemistry calculated interaction energy datasets
topic noncovalent interactions
machine learning force fields
symmetry-adapted perturbation theory
ab initio energy datasets
artificial intelligence
url https://www.mdpi.com/2306-5354/11/1/51
work_keys_str_mv AT zhenxuanfan amachinelearningforcefieldforbiomacromolecularmodelingbasedonquantumchemistrycalculatedinteractionenergydatasets
AT shengdchao amachinelearningforcefieldforbiomacromolecularmodelingbasedonquantumchemistrycalculatedinteractionenergydatasets
AT zhenxuanfan machinelearningforcefieldforbiomacromolecularmodelingbasedonquantumchemistrycalculatedinteractionenergydatasets
AT shengdchao machinelearningforcefieldforbiomacromolecularmodelingbasedonquantumchemistrycalculatedinteractionenergydatasets