A Structured Multiarmed Bandit Problem and the Greedy Policy
We consider a multiarmed bandit problem where the expected reward of each arm is a linear function of an unknown scalar with a prior distribution. The objective is to choose a sequence of arms that maximizes the expected total (or discounted total) reward. We demonstrate the effectiveness of a greed...
Main Authors: | , , |
---|---|
其他作者: | |
格式: | 文件 |
语言: | en_US |
出版: |
Institute of Electrical and Electronics Engineers
2010
|
主题: | |
在线阅读: | http://hdl.handle.net/1721.1/54813 https://orcid.org/0000-0003-2658-8239 |