Output-weighted sampling for multi-armed bandits with extreme payoffs

We present a new type of acquisition function for online decision-making in multi-armed and contextual bandit problems with extreme payoffs. Specifically, we model the payoff function as a Gaussian process and formulate a novel type of upper confidence bound acquisition function that guides explorat...

Full description

Bibliographic Details
Main Authors:	Yang, Yibo, Blanchard, Antoine, Sapsis, Themistoklis, Perdikaris, Paris
Format:	Article
Language:	English
Published:	The Royal Society 2024
Subjects:	General Physics and Astronomy General Engineering General Mathematics
Online Access:	https://hdl.handle.net/1721.1/154219

Internet

https://hdl.handle.net/1721.1/154219

Output-weighted sampling for multi-armed bandits with extreme payoffs

Internet

Similar Items