A Structured Multiarmed Bandit Problem and the Greedy Policy

We consider a multiarmed bandit problem where the expected reward of each arm is a linear function of an unknown scalar with a prior distribution. The objective is to choose a sequence of arms that maximizes the expected total (or discounted total) reward. We demonstrate the effectiveness of a greed...

Full description

Bibliographic Details
Main Authors:	Rusmevichientong, Paat, Mersereau, Adam J., Tsitsiklis, John N.
Other Authors:	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Format:	Article
Language:	en_US
Published:	Institute of Electrical and Electronics Engineers 2010
Subjects:	Markov decision process (MDP)
Online Access:	http://hdl.handle.net/1721.1/54813 https://orcid.org/0000-0003-2658-8239

Internet

http://hdl.handle.net/1721.1/54813
https://orcid.org/0000-0003-2658-8239

A Structured Multiarmed Bandit Problem and the Greedy Policy

Internet

Similar Items