Multi-armed linear bandits with latent biases

In a linear stochastic bandit model, each arm corresponds to a vector in Euclidean space, and the expected return observed at each time step is determined by an unknown linear function of the selected arm. This paper addresses the challenge of identifying the optimal arm in a linear stochastic bandi...

Full description

Bibliographic Details
Main Authors: Kang, Qiyu, Tay, Wee Peng, She, Rui, Wang, Sijie, Liu, Xiaoqian, Yang, Yuan-Rui
Other Authors: School of Electrical and Electronic Engineering
Format: Journal Article
Language:English
Published: 2024
Subjects:
Online Access:https://hdl.handle.net/10356/175416