Calibration of uncertainty in the active learning of machine learning force fields

FFLUX is a machine learning force field that uses the maximum expected prediction error (MEPE) active learning algorithm to improve the efficiency of model training. MEPE uses the predictive uncertainty of a Gaussian process (GP) to balance exploration and exploitation when selecting the next traini...

Full description

Bibliographic Details
Main Authors: Adam Thomas-Mitchell, Glenn Hawe, Paul L A Popelier
Format: Article
Language:English
Published: IOP Publishing 2023-01-01
Series:Machine Learning: Science and Technology
Subjects:
Online Access:https://doi.org/10.1088/2632-2153/ad0ab5
_version_ 1797505156438294528
author Adam Thomas-Mitchell
Glenn Hawe
Paul L A Popelier
author_facet Adam Thomas-Mitchell
Glenn Hawe
Paul L A Popelier
author_sort Adam Thomas-Mitchell
collection DOAJ
description FFLUX is a machine learning force field that uses the maximum expected prediction error (MEPE) active learning algorithm to improve the efficiency of model training. MEPE uses the predictive uncertainty of a Gaussian process (GP) to balance exploration and exploitation when selecting the next training sample. However, the predictive uncertainty of a GP is unlikely to be accurate or precise immediately after training. We hypothesize that calibrating the uncertainty quantification within MEPE will improve active learning performance. We develop and test two methods to improve uncertainty estimates: post-hoc calibration of predictive uncertainty using the CRUDE algorithm, and replacing the GP with a student- t process. We investigate the impact of these methods on MEPE for single sample and batch sample active learning. Our findings suggest that post-hoc calibration does not improve the performance of active learning using the MEPE method. However, we do find that the student- t process can outperform active learning strategies and random sampling using a GP if the training set is sufficiently large.
first_indexed 2024-03-10T04:14:35Z
format Article
id doaj.art-f47c0606007a4148ba0f1eed6f686799
institution Directory Open Access Journal
issn 2632-2153
language English
last_indexed 2024-03-10T04:14:35Z
publishDate 2023-01-01
publisher IOP Publishing
record_format Article
series Machine Learning: Science and Technology
spelling doaj.art-f47c0606007a4148ba0f1eed6f6867992023-11-23T08:05:00ZengIOP PublishingMachine Learning: Science and Technology2632-21532023-01-014404503410.1088/2632-2153/ad0ab5Calibration of uncertainty in the active learning of machine learning force fieldsAdam Thomas-Mitchell0https://orcid.org/0009-0001-7977-5313Glenn Hawe1https://orcid.org/0000-0002-0590-8494Paul L A Popelier2https://orcid.org/0000-0001-9053-1363School of Computing, Ulster University , 2-24 York Street, BT15 1AP Belfast, United KingdomSchool of Computing, Ulster University , 2-24 York Street, BT15 1AP Belfast, United KingdomDepartment of Chemistry, The University of Manchester , Oxford Road, M13 9PL Manchester, United KingdomFFLUX is a machine learning force field that uses the maximum expected prediction error (MEPE) active learning algorithm to improve the efficiency of model training. MEPE uses the predictive uncertainty of a Gaussian process (GP) to balance exploration and exploitation when selecting the next training sample. However, the predictive uncertainty of a GP is unlikely to be accurate or precise immediately after training. We hypothesize that calibrating the uncertainty quantification within MEPE will improve active learning performance. We develop and test two methods to improve uncertainty estimates: post-hoc calibration of predictive uncertainty using the CRUDE algorithm, and replacing the GP with a student- t process. We investigate the impact of these methods on MEPE for single sample and batch sample active learning. Our findings suggest that post-hoc calibration does not improve the performance of active learning using the MEPE method. However, we do find that the student- t process can outperform active learning strategies and random sampling using a GP if the training set is sufficiently large.https://doi.org/10.1088/2632-2153/ad0ab5machine learning force fieldsGaussian processcalibrationactive learninguncertainty quantification
spellingShingle Adam Thomas-Mitchell
Glenn Hawe
Paul L A Popelier
Calibration of uncertainty in the active learning of machine learning force fields
Machine Learning: Science and Technology
machine learning force fields
Gaussian process
calibration
active learning
uncertainty quantification
title Calibration of uncertainty in the active learning of machine learning force fields
title_full Calibration of uncertainty in the active learning of machine learning force fields
title_fullStr Calibration of uncertainty in the active learning of machine learning force fields
title_full_unstemmed Calibration of uncertainty in the active learning of machine learning force fields
title_short Calibration of uncertainty in the active learning of machine learning force fields
title_sort calibration of uncertainty in the active learning of machine learning force fields
topic machine learning force fields
Gaussian process
calibration
active learning
uncertainty quantification
url https://doi.org/10.1088/2632-2153/ad0ab5
work_keys_str_mv AT adamthomasmitchell calibrationofuncertaintyintheactivelearningofmachinelearningforcefields
AT glennhawe calibrationofuncertaintyintheactivelearningofmachinelearningforcefields
AT paullapopelier calibrationofuncertaintyintheactivelearningofmachinelearningforcefields