A Machine Learning Force Field for Bio-Macromolecular Modeling Based on Quantum Chemistry-Calculated Interaction Energy Datasets
Accurate energy data from noncovalent interactions are essential for constructing force fields for molecular dynamics simulations of bio-macromolecular systems. There are two important practical issues in the construction of a reliable force field with the hope of balancing the desired chemical accu...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2024-01-01
|
Series: | Bioengineering |
Subjects: | |
Online Access: | https://www.mdpi.com/2306-5354/11/1/51 |
_version_ | 1827372706877669376 |
---|---|
author | Zhen-Xuan Fan Sheng D. Chao |
author_facet | Zhen-Xuan Fan Sheng D. Chao |
author_sort | Zhen-Xuan Fan |
collection | DOAJ |
description | Accurate energy data from noncovalent interactions are essential for constructing force fields for molecular dynamics simulations of bio-macromolecular systems. There are two important practical issues in the construction of a reliable force field with the hope of balancing the desired chemical accuracy and working efficiency. One is to determine a suitable quantum chemistry level of theory for calculating interaction energies. The other is to use a suitable continuous energy function to model the quantum chemical energy data. For the first issue, we have recently calculated the intermolecular interaction energies using the SAPT0 level of theory, and we have systematically organized these energies into the ab initio SOFG-31 (homodimer) and SOFG-31-heterodimer datasets. In this work, we re-calculate these interaction energies by using the more advanced SAPT2 level of theory with a wider series of basis sets. Our purpose is to determine the SAPT level of theory proper for interaction energies with respect to the CCSD(T)/CBS benchmark chemical accuracy. Next, to utilize these energy datasets, we employ one of the well-developed machine learning techniques, called the CLIFF scheme, to construct a general-purpose force field for biomolecular dynamics simulations. Here we use the SOFG-31 dataset and the SOFG-31-heterodimer dataset as the training and test sets, respectively. Our results demonstrate that using the CLIFF scheme can reproduce a diverse range of dimeric interaction energy patterns with only a small training set. The overall errors for each SAPT energy component, as well as the SAPT total energy, are all well below the desired chemical accuracy of ~1 kcal/mol. |
first_indexed | 2024-03-08T11:04:56Z |
format | Article |
id | doaj.art-00de13b1587e4363ab5a2d7c44abbffb |
institution | Directory Open Access Journal |
issn | 2306-5354 |
language | English |
last_indexed | 2024-03-08T11:04:56Z |
publishDate | 2024-01-01 |
publisher | MDPI AG |
record_format | Article |
series | Bioengineering |
spelling | doaj.art-00de13b1587e4363ab5a2d7c44abbffb2024-01-26T15:06:17ZengMDPI AGBioengineering2306-53542024-01-011115110.3390/bioengineering11010051A Machine Learning Force Field for Bio-Macromolecular Modeling Based on Quantum Chemistry-Calculated Interaction Energy DatasetsZhen-Xuan Fan0Sheng D. Chao1Institute of Applied Mechanics, National Taiwan University, Taipei 106, TaiwanInstitute of Applied Mechanics, National Taiwan University, Taipei 106, TaiwanAccurate energy data from noncovalent interactions are essential for constructing force fields for molecular dynamics simulations of bio-macromolecular systems. There are two important practical issues in the construction of a reliable force field with the hope of balancing the desired chemical accuracy and working efficiency. One is to determine a suitable quantum chemistry level of theory for calculating interaction energies. The other is to use a suitable continuous energy function to model the quantum chemical energy data. For the first issue, we have recently calculated the intermolecular interaction energies using the SAPT0 level of theory, and we have systematically organized these energies into the ab initio SOFG-31 (homodimer) and SOFG-31-heterodimer datasets. In this work, we re-calculate these interaction energies by using the more advanced SAPT2 level of theory with a wider series of basis sets. Our purpose is to determine the SAPT level of theory proper for interaction energies with respect to the CCSD(T)/CBS benchmark chemical accuracy. Next, to utilize these energy datasets, we employ one of the well-developed machine learning techniques, called the CLIFF scheme, to construct a general-purpose force field for biomolecular dynamics simulations. Here we use the SOFG-31 dataset and the SOFG-31-heterodimer dataset as the training and test sets, respectively. Our results demonstrate that using the CLIFF scheme can reproduce a diverse range of dimeric interaction energy patterns with only a small training set. The overall errors for each SAPT energy component, as well as the SAPT total energy, are all well below the desired chemical accuracy of ~1 kcal/mol.https://www.mdpi.com/2306-5354/11/1/51noncovalent interactionsmachine learning force fieldssymmetry-adapted perturbation theoryab initio energy datasetsartificial intelligence |
spellingShingle | Zhen-Xuan Fan Sheng D. Chao A Machine Learning Force Field for Bio-Macromolecular Modeling Based on Quantum Chemistry-Calculated Interaction Energy Datasets Bioengineering noncovalent interactions machine learning force fields symmetry-adapted perturbation theory ab initio energy datasets artificial intelligence |
title | A Machine Learning Force Field for Bio-Macromolecular Modeling Based on Quantum Chemistry-Calculated Interaction Energy Datasets |
title_full | A Machine Learning Force Field for Bio-Macromolecular Modeling Based on Quantum Chemistry-Calculated Interaction Energy Datasets |
title_fullStr | A Machine Learning Force Field for Bio-Macromolecular Modeling Based on Quantum Chemistry-Calculated Interaction Energy Datasets |
title_full_unstemmed | A Machine Learning Force Field for Bio-Macromolecular Modeling Based on Quantum Chemistry-Calculated Interaction Energy Datasets |
title_short | A Machine Learning Force Field for Bio-Macromolecular Modeling Based on Quantum Chemistry-Calculated Interaction Energy Datasets |
title_sort | machine learning force field for bio macromolecular modeling based on quantum chemistry calculated interaction energy datasets |
topic | noncovalent interactions machine learning force fields symmetry-adapted perturbation theory ab initio energy datasets artificial intelligence |
url | https://www.mdpi.com/2306-5354/11/1/51 |
work_keys_str_mv | AT zhenxuanfan amachinelearningforcefieldforbiomacromolecularmodelingbasedonquantumchemistrycalculatedinteractionenergydatasets AT shengdchao amachinelearningforcefieldforbiomacromolecularmodelingbasedonquantumchemistrycalculatedinteractionenergydatasets AT zhenxuanfan machinelearningforcefieldforbiomacromolecularmodelingbasedonquantumchemistrycalculatedinteractionenergydatasets AT shengdchao machinelearningforcefieldforbiomacromolecularmodelingbasedonquantumchemistrycalculatedinteractionenergydatasets |