Impact of training and validation data on the performance of neural network potentials: A case study on carbon using the CA-9 dataset

The use of machine learning to accelerate computer simulations is on the rise. In atomistic simulations, the use of machine learning interatomic potentials (ML-IAPs) can significantly reduce computational costs while maintaining accuracy close to that of ab initio methods. To achieve this, ML-IAPs a...

Full description

Bibliographic Details
Main Authors:	Daniel Hedman, Tom Rothe, Gustav Johansson, Fredrik Sandin, J. Andreas Larsson, Yoshiyuki Miyamoto
Format:	Article
Language:	English
Published:	Elsevier 2021-04-01
Series:	Carbon Trends
Subjects:	CA-9 Dataset Machine learning Interatomic potential Carbon Neural network potential
Online Access:	http://www.sciencedirect.com/science/article/pii/S2667056921000043

_version_	1819068012852936704
author	Daniel Hedman Tom Rothe Gustav Johansson Fredrik Sandin J. Andreas Larsson Yoshiyuki Miyamoto
author_facet	Daniel Hedman Tom Rothe Gustav Johansson Fredrik Sandin J. Andreas Larsson Yoshiyuki Miyamoto
author_sort	Daniel Hedman
collection	DOAJ
description	The use of machine learning to accelerate computer simulations is on the rise. In atomistic simulations, the use of machine learning interatomic potentials (ML-IAPs) can significantly reduce computational costs while maintaining accuracy close to that of ab initio methods. To achieve this, ML-IAPs are trained on large datasets of images, which are atomistic configurations labeled with data from ab initio calculations. Focusing on carbon, we use deep learning to train neural network potentials (NNPs), a form of ML-IAP, based on the state-of-the-art end-to-end NNP architecture SchNet and investigate how the choice of training and validation data affects the performance of the NNPs. Training is performed on the CA-9 dataset, a 9-carbon allotrope dataset constructed using data obtained via ab initio molecular dynamics (AIMD). Our results show that image generation with AIMD causes a high degree of similarity between the generated images, which has a detrimental effect on the performance of the NNPs. But by carefully choosing which images from the dataset are included in the training and validation data, this effect can be mitigated. We conclude by benchmarking our trained NNPs in applications such as relaxation and phonon calculation, where we can reproduce ab initio results with high accuracy.
first_indexed	2024-12-21T16:27:23Z
format	Article
id	doaj.art-615594ec73e84241b8121878a041b0df
institution	Directory Open Access Journal
issn	2667-0569
language	English
last_indexed	2024-12-21T16:27:23Z
publishDate	2021-04-01
publisher	Elsevier
record_format	Article
series	Carbon Trends
spelling	doaj.art-615594ec73e84241b8121878a041b0df2022-12-21T18:57:26ZengElsevierCarbon Trends2667-05692021-04-013100027Impact of training and validation data on the performance of neural network potentials: A case study on carbon using the CA-9 datasetDaniel Hedman0Tom Rothe1Gustav Johansson2Fredrik Sandin3J. Andreas Larsson4Yoshiyuki Miyamoto5Corresponding author at: Research Center for Computational Design of Advanced Functional Materials, National Institute of Advanced Industrial Science and Technology (AIST), Central 2, 1-1-1 Umezono, Tsukuba, Ibaraki 305-8568, Japan.; Research Center for Computational Design of Advanced Functional Materials, National Institute of Advanced Industrial Science and Technology (AIST), Central 2, 1-1-1 Umezono, Tsukuba, Ibaraki 305-8568, Japan; Applied Physics, Division of Materials Science, Department of Engineering Sciences and Mathematics, Luleå University of Technology, Luleå SE-971 87, SwedenInstitute of Physics, Faculty of Natural Sciences, Chemnitz University of Technology, Chemnitz 09126, GermanyApplied Physics, Division of Materials Science, Department of Engineering Sciences and Mathematics, Luleå University of Technology, Luleå SE-971 87, SwedenMachine Learning, Embedded Intelligent Systems Lab, Department of Computer Science, Electrical and Space Engineering, Luleå University of Technology, Luleå SE-971 87, SwedenApplied Physics, Division of Materials Science, Department of Engineering Sciences and Mathematics, Luleå University of Technology, Luleå SE-971 87, SwedenResearch Center for Computational Design of Advanced Functional Materials, National Institute of Advanced Industrial Science and Technology (AIST), Central 2, 1-1-1 Umezono, Tsukuba, Ibaraki 305-8568, JapanThe use of machine learning to accelerate computer simulations is on the rise. In atomistic simulations, the use of machine learning interatomic potentials (ML-IAPs) can significantly reduce computational costs while maintaining accuracy close to that of ab initio methods. To achieve this, ML-IAPs are trained on large datasets of images, which are atomistic configurations labeled with data from ab initio calculations. Focusing on carbon, we use deep learning to train neural network potentials (NNPs), a form of ML-IAP, based on the state-of-the-art end-to-end NNP architecture SchNet and investigate how the choice of training and validation data affects the performance of the NNPs. Training is performed on the CA-9 dataset, a 9-carbon allotrope dataset constructed using data obtained via ab initio molecular dynamics (AIMD). Our results show that image generation with AIMD causes a high degree of similarity between the generated images, which has a detrimental effect on the performance of the NNPs. But by carefully choosing which images from the dataset are included in the training and validation data, this effect can be mitigated. We conclude by benchmarking our trained NNPs in applications such as relaxation and phonon calculation, where we can reproduce ab initio results with high accuracy.http://www.sciencedirect.com/science/article/pii/S2667056921000043CA-9DatasetMachine learningInteratomic potentialCarbonNeural network potential
spellingShingle	Daniel Hedman Tom Rothe Gustav Johansson Fredrik Sandin J. Andreas Larsson Yoshiyuki Miyamoto Impact of training and validation data on the performance of neural network potentials: A case study on carbon using the CA-9 dataset Carbon Trends CA-9 Dataset Machine learning Interatomic potential Carbon Neural network potential
title	Impact of training and validation data on the performance of neural network potentials: A case study on carbon using the CA-9 dataset
title_full	Impact of training and validation data on the performance of neural network potentials: A case study on carbon using the CA-9 dataset
title_fullStr	Impact of training and validation data on the performance of neural network potentials: A case study on carbon using the CA-9 dataset
title_full_unstemmed	Impact of training and validation data on the performance of neural network potentials: A case study on carbon using the CA-9 dataset
title_short	Impact of training and validation data on the performance of neural network potentials: A case study on carbon using the CA-9 dataset
title_sort	impact of training and validation data on the performance of neural network potentials a case study on carbon using the ca 9 dataset
topic	CA-9 Dataset Machine learning Interatomic potential Carbon Neural network potential
url	http://www.sciencedirect.com/science/article/pii/S2667056921000043
work_keys_str_mv	AT danielhedman impactoftrainingandvalidationdataontheperformanceofneuralnetworkpotentialsacasestudyoncarbonusingtheca9dataset AT tomrothe impactoftrainingandvalidationdataontheperformanceofneuralnetworkpotentialsacasestudyoncarbonusingtheca9dataset AT gustavjohansson impactoftrainingandvalidationdataontheperformanceofneuralnetworkpotentialsacasestudyoncarbonusingtheca9dataset AT fredriksandin impactoftrainingandvalidationdataontheperformanceofneuralnetworkpotentialsacasestudyoncarbonusingtheca9dataset AT jandreaslarsson impactoftrainingandvalidationdataontheperformanceofneuralnetworkpotentialsacasestudyoncarbonusingtheca9dataset AT yoshiyukimiyamoto impactoftrainingandvalidationdataontheperformanceofneuralnetworkpotentialsacasestudyoncarbonusingtheca9dataset

Impact of training and validation data on the performance of neural network potentials: A case study on carbon using the CA-9 dataset

Similar Items