Finding the Best Dueler

Consider a set of <i>n</i> players. We suppose that each game involves two players, that there is some unknown player who wins each game it plays with a probability greater than <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"...

Full description

Bibliographic Details
Main Authors:	Zhengu Zhang, Sheldon M. Ross
Format:	Article
Language:	English
Published:	MDPI AG 2023-03-01
Series:	Mathematics
Subjects:	best arm identification dueling bandit
Online Access:	https://www.mdpi.com/2227-7390/11/7/1568

Description
Summary:	Consider a set of <i>n</i> players. We suppose that each game involves two players, that there is some unknown player who wins each game it plays with a probability greater than <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>1</mn><mo>/</mo><mn>2</mn><mo>,</mo></mrow></semantics></math></inline-formula> and that our objective is to determine this best player. Under the requirement that the policy employed guarantees a correct choice with a probability of at least some specified value, we look for a policy that has a relatively small expected number of games played before decision. We consider this problem both under the assumption that the best player wins each game with a probability of at least some specified value <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi>p</mi><mn>0</mn></msub><mo>></mo><mn>1</mn><mo>/</mo><mn>2</mn><mo>,</mo></mrow></semantics></math></inline-formula> and under a Bayesian assumption that the probability that player <i>i</i> wins a game against player <i>j</i> is <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mfrac><msub><mi>v</mi><mi>i</mi></msub><mrow><msub><mi>v</mi><mi>i</mi></msub><mo>+</mo><msub><mi>v</mi><mi>j</mi></msub></mrow></mfrac></semantics></math></inline-formula>, where <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi>v</mi><mn>1</mn></msub><mo>,</mo><mo>…</mo><mo>,</mo><msub><mi>v</mi><mi>n</mi></msub></mrow></semantics></math></inline-formula> are the unknown values of <i>n</i> independent and identically distributed exponential random variables. In the former case, we propose a policy where chosen pairs play a match that ends when one of them has had a specified number of wins more than the other; in the latter case, we propose a Thompson sampling type rule.
ISSN:	2227-7390

Finding the Best Dueler

Similar Items