Undiscounted bandit games
We analyze undiscounted continuous-time games of strategic experimentation with two-armed bandits. The risky arm generates payoffs according to a Le´vy process with an unknown average payoff per unit of time which nature draws from an arbitrary finite set. Observing all actions and realized...
Main Authors: | , |
---|---|
Format: | Working paper |
Published: |
University of Oxford
2019
|
_version_ | 1797096405008908288 |
---|---|
author | Keller, G Rady, S |
author_facet | Keller, G Rady, S |
author_sort | Keller, G |
collection | OXFORD |
description | We analyze undiscounted continuous-time games of strategic experimentation with two-armed bandits. The risky arm generates payoffs according to a Le´vy process with an unknown average payoff per unit of time which nature draws from an arbitrary finite set. Observing all actions and realized payoffs, players use Markov strategies with the common posterior belief about the unknown parameter as the state variable. We show that the unique symmetric Markov perfect equilibrium can be computed in a simple closed form involving only the payoff of the safe arm, the expected current payoff of the risky arm, and the expected full-information payoff, given the current belief. In particular, the equilibrium does not depend on the precise specification of the payoff-generating processes. |
first_indexed | 2024-03-07T04:41:22Z |
format | Working paper |
id | oxford-uuid:d1c3ff41-dfac-432f-b982-cc7dd62c25cd |
institution | University of Oxford |
last_indexed | 2024-03-07T04:41:22Z |
publishDate | 2019 |
publisher | University of Oxford |
record_format | dspace |
spelling | oxford-uuid:d1c3ff41-dfac-432f-b982-cc7dd62c25cd2022-03-27T07:59:11ZUndiscounted bandit gamesWorking paperhttp://purl.org/coar/resource_type/c_8042uuid:d1c3ff41-dfac-432f-b982-cc7dd62c25cdBulk import via SwordSymplectic ElementsUniversity of Oxford2019Keller, GRady, SWe analyze undiscounted continuous-time games of strategic experimentation with two-armed bandits. The risky arm generates payoffs according to a Le´vy process with an unknown average payoff per unit of time which nature draws from an arbitrary finite set. Observing all actions and realized payoffs, players use Markov strategies with the common posterior belief about the unknown parameter as the state variable. We show that the unique symmetric Markov perfect equilibrium can be computed in a simple closed form involving only the payoff of the safe arm, the expected current payoff of the risky arm, and the expected full-information payoff, given the current belief. In particular, the equilibrium does not depend on the precise specification of the payoff-generating processes. |
spellingShingle | Keller, G Rady, S Undiscounted bandit games |
title | Undiscounted bandit games |
title_full | Undiscounted bandit games |
title_fullStr | Undiscounted bandit games |
title_full_unstemmed | Undiscounted bandit games |
title_short | Undiscounted bandit games |
title_sort | undiscounted bandit games |
work_keys_str_mv | AT kellerg undiscountedbanditgames AT radys undiscountedbanditgames |