Optimal criteria and their asymptotic form for data selection in data-driven reduced-order modelling with Gaussian process regression

We derive criteria for the selection of datapoints used for data-driven reduced-order modelling and other areas of supervised learning based on Gaussian process regression (GPR). While this is a well-studied area in the fields of active learning and optimal experimental design, most criteria in the...

Full description

Bibliographic Details
Main Authors:	Sapsis, Themistoklis P., Blanchard, Antoine
Other Authors:	Massachusetts Institute of Technology. Department of Mechanical Engineering
Format:	Article
Language:	English
Published:	The Royal Society 2024
Subjects:	General Physics and Astronomy General Engineering General Mathematics
Online Access:	https://hdl.handle.net/1721.1/154218

_version_	1826191932484747264
author	Sapsis, Themistoklis P. Blanchard, Antoine
author2	Massachusetts Institute of Technology. Department of Mechanical Engineering
author_facet	Massachusetts Institute of Technology. Department of Mechanical Engineering Sapsis, Themistoklis P. Blanchard, Antoine
author_sort	Sapsis, Themistoklis P.
collection	MIT
description	We derive criteria for the selection of datapoints used for data-driven reduced-order modelling and other areas of supervised learning based on Gaussian process regression (GPR). While this is a well-studied area in the fields of active learning and optimal experimental design, most criteria in the literature are empirical. Here we introduce an optimality condition for the selection of a new input defined as the minimizer of the distance between the approximated output probability density function (pdf) of the reduced-order model and the exact one. Given that the exact pdf is unknown, we define the selection criterion as the supremum over the unit sphere of the native Hilbert space for the GPR. The resulting selection criterion, however, has a form that is difficult to compute. We combine results from GPR theory and asymptotic analysis to derive a computable form of the defined optimality criterion that is valid in the limit of small predictive variance. The derived asymptotic form of the selection criterion leads to convergence of the GPR model that guarantees a balanced distribution of data resources between probable and large-deviation outputs, resulting in an effective way of sampling towards data-driven reduced-order modelling.
first_indexed	2024-09-23T09:03:35Z
format	Article
id	mit-1721.1/154218
institution	Massachusetts Institute of Technology
language	English
last_indexed	2025-02-19T04:17:36Z
publishDate	2024
publisher	The Royal Society
record_format	dspace
spelling	mit-1721.1/1542182024-12-23T05:53:14Z Optimal criteria and their asymptotic form for data selection in data-driven reduced-order modelling with Gaussian process regression Sapsis, Themistoklis P. Blanchard, Antoine Massachusetts Institute of Technology. Department of Mechanical Engineering General Physics and Astronomy General Engineering General Mathematics We derive criteria for the selection of datapoints used for data-driven reduced-order modelling and other areas of supervised learning based on Gaussian process regression (GPR). While this is a well-studied area in the fields of active learning and optimal experimental design, most criteria in the literature are empirical. Here we introduce an optimality condition for the selection of a new input defined as the minimizer of the distance between the approximated output probability density function (pdf) of the reduced-order model and the exact one. Given that the exact pdf is unknown, we define the selection criterion as the supremum over the unit sphere of the native Hilbert space for the GPR. The resulting selection criterion, however, has a form that is difficult to compute. We combine results from GPR theory and asymptotic analysis to derive a computable form of the defined optimality criterion that is valid in the limit of small predictive variance. The derived asymptotic form of the selection criterion leads to convergence of the GPR model that guarantees a balanced distribution of data resources between probable and large-deviation outputs, resulting in an effective way of sampling towards data-driven reduced-order modelling. 2024-04-18T17:38:13Z 2024-04-18T17:38:13Z 2022-06-20 2024-04-18T17:34:39Z Article http://purl.org/eprint/type/JournalArticle 1364-503X 1471-2962 https://hdl.handle.net/1721.1/154218 Sapsis, Themistoklis P. and Blanchard, Antoine. 2022. "Optimal criteria and their asymptotic form for data selection in data-driven reduced-order modelling with Gaussian process regression." Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 380 (2229). en 10.1098/rsta.2021.0197 Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences Creative Commons Attribution-Noncommercial-ShareAlike http://creativecommons.org/licenses/by-nc-sa/4.0/ application/pdf The Royal Society arxiv
spellingShingle	General Physics and Astronomy General Engineering General Mathematics Sapsis, Themistoklis P. Blanchard, Antoine Optimal criteria and their asymptotic form for data selection in data-driven reduced-order modelling with Gaussian process regression
title	Optimal criteria and their asymptotic form for data selection in data-driven reduced-order modelling with Gaussian process regression
title_full	Optimal criteria and their asymptotic form for data selection in data-driven reduced-order modelling with Gaussian process regression
title_fullStr	Optimal criteria and their asymptotic form for data selection in data-driven reduced-order modelling with Gaussian process regression
title_full_unstemmed	Optimal criteria and their asymptotic form for data selection in data-driven reduced-order modelling with Gaussian process regression
title_short	Optimal criteria and their asymptotic form for data selection in data-driven reduced-order modelling with Gaussian process regression
title_sort	optimal criteria and their asymptotic form for data selection in data driven reduced order modelling with gaussian process regression
topic	General Physics and Astronomy General Engineering General Mathematics
url	https://hdl.handle.net/1721.1/154218
work_keys_str_mv	AT sapsisthemistoklisp optimalcriteriaandtheirasymptoticformfordataselectionindatadrivenreducedordermodellingwithgaussianprocessregression AT blanchardantoine optimalcriteriaandtheirasymptoticformfordataselectionindatadrivenreducedordermodellingwithgaussianprocessregression

Optimal criteria and their asymptotic form for data selection in data-driven reduced-order modelling with Gaussian process regression

Similar Items