Impact of training and validation data on the performance of neural network potentials: A case study on carbon using the CA-9 dataset
The use of machine learning to accelerate computer simulations is on the rise. In atomistic simulations, the use of machine learning interatomic potentials (ML-IAPs) can significantly reduce computational costs while maintaining accuracy close to that of ab initio methods. To achieve this, ML-IAPs a...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2021-04-01
|
Series: | Carbon Trends |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2667056921000043 |
_version_ | 1819068012852936704 |
---|---|
author | Daniel Hedman Tom Rothe Gustav Johansson Fredrik Sandin J. Andreas Larsson Yoshiyuki Miyamoto |
author_facet | Daniel Hedman Tom Rothe Gustav Johansson Fredrik Sandin J. Andreas Larsson Yoshiyuki Miyamoto |
author_sort | Daniel Hedman |
collection | DOAJ |
description | The use of machine learning to accelerate computer simulations is on the rise. In atomistic simulations, the use of machine learning interatomic potentials (ML-IAPs) can significantly reduce computational costs while maintaining accuracy close to that of ab initio methods. To achieve this, ML-IAPs are trained on large datasets of images, which are atomistic configurations labeled with data from ab initio calculations. Focusing on carbon, we use deep learning to train neural network potentials (NNPs), a form of ML-IAP, based on the state-of-the-art end-to-end NNP architecture SchNet and investigate how the choice of training and validation data affects the performance of the NNPs. Training is performed on the CA-9 dataset, a 9-carbon allotrope dataset constructed using data obtained via ab initio molecular dynamics (AIMD). Our results show that image generation with AIMD causes a high degree of similarity between the generated images, which has a detrimental effect on the performance of the NNPs. But by carefully choosing which images from the dataset are included in the training and validation data, this effect can be mitigated. We conclude by benchmarking our trained NNPs in applications such as relaxation and phonon calculation, where we can reproduce ab initio results with high accuracy. |
first_indexed | 2024-12-21T16:27:23Z |
format | Article |
id | doaj.art-615594ec73e84241b8121878a041b0df |
institution | Directory Open Access Journal |
issn | 2667-0569 |
language | English |
last_indexed | 2024-12-21T16:27:23Z |
publishDate | 2021-04-01 |
publisher | Elsevier |
record_format | Article |
series | Carbon Trends |
spelling | doaj.art-615594ec73e84241b8121878a041b0df2022-12-21T18:57:26ZengElsevierCarbon Trends2667-05692021-04-013100027Impact of training and validation data on the performance of neural network potentials: A case study on carbon using the CA-9 datasetDaniel Hedman0Tom Rothe1Gustav Johansson2Fredrik Sandin3J. Andreas Larsson4Yoshiyuki Miyamoto5Corresponding author at: Research Center for Computational Design of Advanced Functional Materials, National Institute of Advanced Industrial Science and Technology (AIST), Central 2, 1-1-1 Umezono, Tsukuba, Ibaraki 305-8568, Japan.; Research Center for Computational Design of Advanced Functional Materials, National Institute of Advanced Industrial Science and Technology (AIST), Central 2, 1-1-1 Umezono, Tsukuba, Ibaraki 305-8568, Japan; Applied Physics, Division of Materials Science, Department of Engineering Sciences and Mathematics, Luleå University of Technology, Luleå SE-971 87, SwedenInstitute of Physics, Faculty of Natural Sciences, Chemnitz University of Technology, Chemnitz 09126, GermanyApplied Physics, Division of Materials Science, Department of Engineering Sciences and Mathematics, Luleå University of Technology, Luleå SE-971 87, SwedenMachine Learning, Embedded Intelligent Systems Lab, Department of Computer Science, Electrical and Space Engineering, Luleå University of Technology, Luleå SE-971 87, SwedenApplied Physics, Division of Materials Science, Department of Engineering Sciences and Mathematics, Luleå University of Technology, Luleå SE-971 87, SwedenResearch Center for Computational Design of Advanced Functional Materials, National Institute of Advanced Industrial Science and Technology (AIST), Central 2, 1-1-1 Umezono, Tsukuba, Ibaraki 305-8568, JapanThe use of machine learning to accelerate computer simulations is on the rise. In atomistic simulations, the use of machine learning interatomic potentials (ML-IAPs) can significantly reduce computational costs while maintaining accuracy close to that of ab initio methods. To achieve this, ML-IAPs are trained on large datasets of images, which are atomistic configurations labeled with data from ab initio calculations. Focusing on carbon, we use deep learning to train neural network potentials (NNPs), a form of ML-IAP, based on the state-of-the-art end-to-end NNP architecture SchNet and investigate how the choice of training and validation data affects the performance of the NNPs. Training is performed on the CA-9 dataset, a 9-carbon allotrope dataset constructed using data obtained via ab initio molecular dynamics (AIMD). Our results show that image generation with AIMD causes a high degree of similarity between the generated images, which has a detrimental effect on the performance of the NNPs. But by carefully choosing which images from the dataset are included in the training and validation data, this effect can be mitigated. We conclude by benchmarking our trained NNPs in applications such as relaxation and phonon calculation, where we can reproduce ab initio results with high accuracy.http://www.sciencedirect.com/science/article/pii/S2667056921000043CA-9DatasetMachine learningInteratomic potentialCarbonNeural network potential |
spellingShingle | Daniel Hedman Tom Rothe Gustav Johansson Fredrik Sandin J. Andreas Larsson Yoshiyuki Miyamoto Impact of training and validation data on the performance of neural network potentials: A case study on carbon using the CA-9 dataset Carbon Trends CA-9 Dataset Machine learning Interatomic potential Carbon Neural network potential |
title | Impact of training and validation data on the performance of neural network potentials: A case study on carbon using the CA-9 dataset |
title_full | Impact of training and validation data on the performance of neural network potentials: A case study on carbon using the CA-9 dataset |
title_fullStr | Impact of training and validation data on the performance of neural network potentials: A case study on carbon using the CA-9 dataset |
title_full_unstemmed | Impact of training and validation data on the performance of neural network potentials: A case study on carbon using the CA-9 dataset |
title_short | Impact of training and validation data on the performance of neural network potentials: A case study on carbon using the CA-9 dataset |
title_sort | impact of training and validation data on the performance of neural network potentials a case study on carbon using the ca 9 dataset |
topic | CA-9 Dataset Machine learning Interatomic potential Carbon Neural network potential |
url | http://www.sciencedirect.com/science/article/pii/S2667056921000043 |
work_keys_str_mv | AT danielhedman impactoftrainingandvalidationdataontheperformanceofneuralnetworkpotentialsacasestudyoncarbonusingtheca9dataset AT tomrothe impactoftrainingandvalidationdataontheperformanceofneuralnetworkpotentialsacasestudyoncarbonusingtheca9dataset AT gustavjohansson impactoftrainingandvalidationdataontheperformanceofneuralnetworkpotentialsacasestudyoncarbonusingtheca9dataset AT fredriksandin impactoftrainingandvalidationdataontheperformanceofneuralnetworkpotentialsacasestudyoncarbonusingtheca9dataset AT jandreaslarsson impactoftrainingandvalidationdataontheperformanceofneuralnetworkpotentialsacasestudyoncarbonusingtheca9dataset AT yoshiyukimiyamoto impactoftrainingandvalidationdataontheperformanceofneuralnetworkpotentialsacasestudyoncarbonusingtheca9dataset |