Undiscounted bandit games

We analyze undiscounted continuous-time games of strategic experimentation with two-armed bandits. The risky arm generates payoffs according to a Le´vy process with an unknown average payoff per unit of time which nature draws from an arbitrary finite set. Observing all actions and realized...

Full description

Bibliographic Details
Main Authors:	Keller, G, Rady, S
Format:	Working paper
Published:	University of Oxford 2019

_version_	1797096405008908288
author	Keller, G Rady, S
author_facet	Keller, G Rady, S
author_sort	Keller, G
collection	OXFORD
description	We analyze undiscounted continuous-time games of strategic experimentation with two-armed bandits. The risky arm generates payoffs according to a Le´vy process with an unknown average payoff per unit of time which nature draws from an arbitrary finite set. Observing all actions and realized payoffs, players use Markov strategies with the common posterior belief about the unknown parameter as the state variable. We show that the unique symmetric Markov perfect equilibrium can be computed in a simple closed form involving only the payoff of the safe arm, the expected current payoff of the risky arm, and the expected full-information payoff, given the current belief. In particular, the equilibrium does not depend on the precise specification of the payoff-generating processes.
first_indexed	2024-03-07T04:41:22Z
format	Working paper
id	oxford-uuid:d1c3ff41-dfac-432f-b982-cc7dd62c25cd
institution	University of Oxford
last_indexed	2024-03-07T04:41:22Z
publishDate	2019
publisher	University of Oxford
record_format	dspace
spelling	oxford-uuid:d1c3ff41-dfac-432f-b982-cc7dd62c25cd2022-03-27T07:59:11ZUndiscounted bandit gamesWorking paperhttp://purl.org/coar/resource_type/c_8042uuid:d1c3ff41-dfac-432f-b982-cc7dd62c25cdBulk import via SwordSymplectic ElementsUniversity of Oxford2019Keller, GRady, SWe analyze undiscounted continuous-time games of strategic experimentation with two-armed bandits. The risky arm generates payoffs according to a Le´vy process with an unknown average payoff per unit of time which nature draws from an arbitrary finite set. Observing all actions and realized payoffs, players use Markov strategies with the common posterior belief about the unknown parameter as the state variable. We show that the unique symmetric Markov perfect equilibrium can be computed in a simple closed form involving only the payoff of the safe arm, the expected current payoff of the risky arm, and the expected full-information payoff, given the current belief. In particular, the equilibrium does not depend on the precise specification of the payoff-generating processes.
spellingShingle	Keller, G Rady, S Undiscounted bandit games
title	Undiscounted bandit games
title_full	Undiscounted bandit games
title_fullStr	Undiscounted bandit games
title_full_unstemmed	Undiscounted bandit games
title_short	Undiscounted bandit games
title_sort	undiscounted bandit games
work_keys_str_mv	AT kellerg undiscountedbanditgames AT radys undiscountedbanditgames

Undiscounted bandit games

Similar Items