Finding the Best Dueler

Consider a set of <i>n</i> players. We suppose that each game involves two players, that there is some unknown player who wins each game it plays with a probability greater than <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"...

Full description

Bibliographic Details
Main Authors: Zhengu Zhang, Sheldon M. Ross
Format: Article
Language:English
Published: MDPI AG 2023-03-01
Series:Mathematics
Subjects:
Online Access:https://www.mdpi.com/2227-7390/11/7/1568
_version_ 1797607465471180800
author Zhengu Zhang
Sheldon M. Ross
author_facet Zhengu Zhang
Sheldon M. Ross
author_sort Zhengu Zhang
collection DOAJ
description Consider a set of <i>n</i> players. We suppose that each game involves two players, that there is some unknown player who wins each game it plays with a probability greater than <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>1</mn><mo>/</mo><mn>2</mn><mo>,</mo></mrow></semantics></math></inline-formula> and that our objective is to determine this best player. Under the requirement that the policy employed guarantees a correct choice with a probability of at least some specified value, we look for a policy that has a relatively small expected number of games played before decision. We consider this problem both under the assumption that the best player wins each game with a probability of at least some specified value <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi>p</mi><mn>0</mn></msub><mo>></mo><mn>1</mn><mo>/</mo><mn>2</mn><mo>,</mo></mrow></semantics></math></inline-formula> and under a Bayesian assumption that the probability that player <i>i</i> wins a game against player <i>j</i> is <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mfrac><msub><mi>v</mi><mi>i</mi></msub><mrow><msub><mi>v</mi><mi>i</mi></msub><mo>+</mo><msub><mi>v</mi><mi>j</mi></msub></mrow></mfrac></semantics></math></inline-formula>, where <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi>v</mi><mn>1</mn></msub><mo>,</mo><mo>…</mo><mo>,</mo><msub><mi>v</mi><mi>n</mi></msub></mrow></semantics></math></inline-formula> are the unknown values of <i>n</i> independent and identically distributed exponential random variables. In the former case, we propose a policy where chosen pairs play a match that ends when one of them has had a specified number of wins more than the other; in the latter case, we propose a Thompson sampling type rule.
first_indexed 2024-03-11T05:30:22Z
format Article
id doaj.art-1011476ad9fb4c28bc28ae1f6081e0c0
institution Directory Open Access Journal
issn 2227-7390
language English
last_indexed 2024-03-11T05:30:22Z
publishDate 2023-03-01
publisher MDPI AG
record_format Article
series Mathematics
spelling doaj.art-1011476ad9fb4c28bc28ae1f6081e0c02023-11-17T17:07:28ZengMDPI AGMathematics2227-73902023-03-01117156810.3390/math11071568Finding the Best DuelerZhengu Zhang0Sheldon M. Ross1Department of Industrial and Systems Engineering, University of Southern California, Los Angeles, CA 90089, USADepartment of Industrial and Systems Engineering, University of Southern California, Los Angeles, CA 90089, USAConsider a set of <i>n</i> players. We suppose that each game involves two players, that there is some unknown player who wins each game it plays with a probability greater than <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>1</mn><mo>/</mo><mn>2</mn><mo>,</mo></mrow></semantics></math></inline-formula> and that our objective is to determine this best player. Under the requirement that the policy employed guarantees a correct choice with a probability of at least some specified value, we look for a policy that has a relatively small expected number of games played before decision. We consider this problem both under the assumption that the best player wins each game with a probability of at least some specified value <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi>p</mi><mn>0</mn></msub><mo>></mo><mn>1</mn><mo>/</mo><mn>2</mn><mo>,</mo></mrow></semantics></math></inline-formula> and under a Bayesian assumption that the probability that player <i>i</i> wins a game against player <i>j</i> is <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mfrac><msub><mi>v</mi><mi>i</mi></msub><mrow><msub><mi>v</mi><mi>i</mi></msub><mo>+</mo><msub><mi>v</mi><mi>j</mi></msub></mrow></mfrac></semantics></math></inline-formula>, where <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi>v</mi><mn>1</mn></msub><mo>,</mo><mo>…</mo><mo>,</mo><msub><mi>v</mi><mi>n</mi></msub></mrow></semantics></math></inline-formula> are the unknown values of <i>n</i> independent and identically distributed exponential random variables. In the former case, we propose a policy where chosen pairs play a match that ends when one of them has had a specified number of wins more than the other; in the latter case, we propose a Thompson sampling type rule.https://www.mdpi.com/2227-7390/11/7/1568best arm identificationdueling bandit
spellingShingle Zhengu Zhang
Sheldon M. Ross
Finding the Best Dueler
Mathematics
best arm identification
dueling bandit
title Finding the Best Dueler
title_full Finding the Best Dueler
title_fullStr Finding the Best Dueler
title_full_unstemmed Finding the Best Dueler
title_short Finding the Best Dueler
title_sort finding the best dueler
topic best arm identification
dueling bandit
url https://www.mdpi.com/2227-7390/11/7/1568
work_keys_str_mv AT zhenguzhang findingthebestdueler
AT sheldonmross findingthebestdueler