Output-weighted sampling for multi-armed bandits with extreme payoffs

We present a new type of acquisition function for online decision-making in multi-armed and contextual bandit problems with extreme payoffs. Specifically, we model the payoff function as a Gaussian process and formulate a novel type of upper confidence bound acquisition function that guides explorat...

Full description

Bibliographic Details
Main Authors: Yang, Yibo, Blanchard, Antoine, Sapsis, Themistoklis, Perdikaris, Paris
Format: Article
Language:English
Published: The Royal Society 2024
Subjects:
Online Access:https://hdl.handle.net/1721.1/154219

Similar Items