Impact of training and validation data on the performance of neural network potentials: A case study on carbon using the CA-9 dataset

The use of machine learning to accelerate computer simulations is on the rise. In atomistic simulations, the use of machine learning interatomic potentials (ML-IAPs) can significantly reduce computational costs while maintaining accuracy close to that of ab initio methods. To achieve this, ML-IAPs a...

Full description

Bibliographic Details
Main Authors: Daniel Hedman, Tom Rothe, Gustav Johansson, Fredrik Sandin, J. Andreas Larsson, Yoshiyuki Miyamoto
Format: Article
Language:English
Published: Elsevier 2021-04-01
Series:Carbon Trends
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2667056921000043
_version_ 1819068012852936704
author Daniel Hedman
Tom Rothe
Gustav Johansson
Fredrik Sandin
J. Andreas Larsson
Yoshiyuki Miyamoto
author_facet Daniel Hedman
Tom Rothe
Gustav Johansson
Fredrik Sandin
J. Andreas Larsson
Yoshiyuki Miyamoto
author_sort Daniel Hedman
collection DOAJ
description The use of machine learning to accelerate computer simulations is on the rise. In atomistic simulations, the use of machine learning interatomic potentials (ML-IAPs) can significantly reduce computational costs while maintaining accuracy close to that of ab initio methods. To achieve this, ML-IAPs are trained on large datasets of images, which are atomistic configurations labeled with data from ab initio calculations. Focusing on carbon, we use deep learning to train neural network potentials (NNPs), a form of ML-IAP, based on the state-of-the-art end-to-end NNP architecture SchNet and investigate how the choice of training and validation data affects the performance of the NNPs. Training is performed on the CA-9 dataset, a 9-carbon allotrope dataset constructed using data obtained via ab initio molecular dynamics (AIMD). Our results show that image generation with AIMD causes a high degree of similarity between the generated images, which has a detrimental effect on the performance of the NNPs. But by carefully choosing which images from the dataset are included in the training and validation data, this effect can be mitigated. We conclude by benchmarking our trained NNPs in applications such as relaxation and phonon calculation, where we can reproduce ab initio results with high accuracy.
first_indexed 2024-12-21T16:27:23Z
format Article
id doaj.art-615594ec73e84241b8121878a041b0df
institution Directory Open Access Journal
issn 2667-0569
language English
last_indexed 2024-12-21T16:27:23Z
publishDate 2021-04-01
publisher Elsevier
record_format Article
series Carbon Trends
spelling doaj.art-615594ec73e84241b8121878a041b0df2022-12-21T18:57:26ZengElsevierCarbon Trends2667-05692021-04-013100027Impact of training and validation data on the performance of neural network potentials: A case study on carbon using the CA-9 datasetDaniel Hedman0Tom Rothe1Gustav Johansson2Fredrik Sandin3J. Andreas Larsson4Yoshiyuki Miyamoto5Corresponding author at: Research Center for Computational Design of Advanced Functional Materials, National Institute of Advanced Industrial Science and Technology (AIST), Central 2, 1-1-1 Umezono, Tsukuba, Ibaraki 305-8568, Japan.; Research Center for Computational Design of Advanced Functional Materials, National Institute of Advanced Industrial Science and Technology (AIST), Central 2, 1-1-1 Umezono, Tsukuba, Ibaraki 305-8568, Japan; Applied Physics, Division of Materials Science, Department of Engineering Sciences and Mathematics, Luleå University of Technology, Luleå SE-971 87, SwedenInstitute of Physics, Faculty of Natural Sciences, Chemnitz University of Technology, Chemnitz 09126, GermanyApplied Physics, Division of Materials Science, Department of Engineering Sciences and Mathematics, Luleå University of Technology, Luleå SE-971 87, SwedenMachine Learning, Embedded Intelligent Systems Lab, Department of Computer Science, Electrical and Space Engineering, Luleå University of Technology, Luleå SE-971 87, SwedenApplied Physics, Division of Materials Science, Department of Engineering Sciences and Mathematics, Luleå University of Technology, Luleå SE-971 87, SwedenResearch Center for Computational Design of Advanced Functional Materials, National Institute of Advanced Industrial Science and Technology (AIST), Central 2, 1-1-1 Umezono, Tsukuba, Ibaraki 305-8568, JapanThe use of machine learning to accelerate computer simulations is on the rise. In atomistic simulations, the use of machine learning interatomic potentials (ML-IAPs) can significantly reduce computational costs while maintaining accuracy close to that of ab initio methods. To achieve this, ML-IAPs are trained on large datasets of images, which are atomistic configurations labeled with data from ab initio calculations. Focusing on carbon, we use deep learning to train neural network potentials (NNPs), a form of ML-IAP, based on the state-of-the-art end-to-end NNP architecture SchNet and investigate how the choice of training and validation data affects the performance of the NNPs. Training is performed on the CA-9 dataset, a 9-carbon allotrope dataset constructed using data obtained via ab initio molecular dynamics (AIMD). Our results show that image generation with AIMD causes a high degree of similarity between the generated images, which has a detrimental effect on the performance of the NNPs. But by carefully choosing which images from the dataset are included in the training and validation data, this effect can be mitigated. We conclude by benchmarking our trained NNPs in applications such as relaxation and phonon calculation, where we can reproduce ab initio results with high accuracy.http://www.sciencedirect.com/science/article/pii/S2667056921000043CA-9DatasetMachine learningInteratomic potentialCarbonNeural network potential
spellingShingle Daniel Hedman
Tom Rothe
Gustav Johansson
Fredrik Sandin
J. Andreas Larsson
Yoshiyuki Miyamoto
Impact of training and validation data on the performance of neural network potentials: A case study on carbon using the CA-9 dataset
Carbon Trends
CA-9
Dataset
Machine learning
Interatomic potential
Carbon
Neural network potential
title Impact of training and validation data on the performance of neural network potentials: A case study on carbon using the CA-9 dataset
title_full Impact of training and validation data on the performance of neural network potentials: A case study on carbon using the CA-9 dataset
title_fullStr Impact of training and validation data on the performance of neural network potentials: A case study on carbon using the CA-9 dataset
title_full_unstemmed Impact of training and validation data on the performance of neural network potentials: A case study on carbon using the CA-9 dataset
title_short Impact of training and validation data on the performance of neural network potentials: A case study on carbon using the CA-9 dataset
title_sort impact of training and validation data on the performance of neural network potentials a case study on carbon using the ca 9 dataset
topic CA-9
Dataset
Machine learning
Interatomic potential
Carbon
Neural network potential
url http://www.sciencedirect.com/science/article/pii/S2667056921000043
work_keys_str_mv AT danielhedman impactoftrainingandvalidationdataontheperformanceofneuralnetworkpotentialsacasestudyoncarbonusingtheca9dataset
AT tomrothe impactoftrainingandvalidationdataontheperformanceofneuralnetworkpotentialsacasestudyoncarbonusingtheca9dataset
AT gustavjohansson impactoftrainingandvalidationdataontheperformanceofneuralnetworkpotentialsacasestudyoncarbonusingtheca9dataset
AT fredriksandin impactoftrainingandvalidationdataontheperformanceofneuralnetworkpotentialsacasestudyoncarbonusingtheca9dataset
AT jandreaslarsson impactoftrainingandvalidationdataontheperformanceofneuralnetworkpotentialsacasestudyoncarbonusingtheca9dataset
AT yoshiyukimiyamoto impactoftrainingandvalidationdataontheperformanceofneuralnetworkpotentialsacasestudyoncarbonusingtheca9dataset