Multi-armed linear bandits with latent biases
In a linear stochastic bandit model, each arm corresponds to a vector in Euclidean space, and the expected return observed at each time step is determined by an unknown linear function of the selected arm. This paper addresses the challenge of identifying the optimal arm in a linear stochastic bandi...
Main Authors: | , , , , , |
---|---|
Other Authors: | |
Format: | Journal Article |
Language: | English |
Published: |
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/175416 |