Contextual bandits with cross-learning
© 2019 Neural information processing systems foundation. All rights reserved. In the classical contextual bandits problem, in each round t, a learner observes some context c, chooses some action a to perform, and receives some reward ra,t(c). We consider the variant of this problem where in addition...
Format: | Article |
---|---|
Language: | English |
Published: |
2021
|
Online Access: | https://hdl.handle.net/1721.1/137415 |