Finding the Best Dueler
Consider a set of <i>n</i> players. We suppose that each game involves two players, that there is some unknown player who wins each game it plays with a probability greater than <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-03-01
|
Series: | Mathematics |
Subjects: | |
Online Access: | https://www.mdpi.com/2227-7390/11/7/1568 |
_version_ | 1797607465471180800 |
---|---|
author | Zhengu Zhang Sheldon M. Ross |
author_facet | Zhengu Zhang Sheldon M. Ross |
author_sort | Zhengu Zhang |
collection | DOAJ |
description | Consider a set of <i>n</i> players. We suppose that each game involves two players, that there is some unknown player who wins each game it plays with a probability greater than <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>1</mn><mo>/</mo><mn>2</mn><mo>,</mo></mrow></semantics></math></inline-formula> and that our objective is to determine this best player. Under the requirement that the policy employed guarantees a correct choice with a probability of at least some specified value, we look for a policy that has a relatively small expected number of games played before decision. We consider this problem both under the assumption that the best player wins each game with a probability of at least some specified value <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi>p</mi><mn>0</mn></msub><mo>></mo><mn>1</mn><mo>/</mo><mn>2</mn><mo>,</mo></mrow></semantics></math></inline-formula> and under a Bayesian assumption that the probability that player <i>i</i> wins a game against player <i>j</i> is <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mfrac><msub><mi>v</mi><mi>i</mi></msub><mrow><msub><mi>v</mi><mi>i</mi></msub><mo>+</mo><msub><mi>v</mi><mi>j</mi></msub></mrow></mfrac></semantics></math></inline-formula>, where <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi>v</mi><mn>1</mn></msub><mo>,</mo><mo>…</mo><mo>,</mo><msub><mi>v</mi><mi>n</mi></msub></mrow></semantics></math></inline-formula> are the unknown values of <i>n</i> independent and identically distributed exponential random variables. In the former case, we propose a policy where chosen pairs play a match that ends when one of them has had a specified number of wins more than the other; in the latter case, we propose a Thompson sampling type rule. |
first_indexed | 2024-03-11T05:30:22Z |
format | Article |
id | doaj.art-1011476ad9fb4c28bc28ae1f6081e0c0 |
institution | Directory Open Access Journal |
issn | 2227-7390 |
language | English |
last_indexed | 2024-03-11T05:30:22Z |
publishDate | 2023-03-01 |
publisher | MDPI AG |
record_format | Article |
series | Mathematics |
spelling | doaj.art-1011476ad9fb4c28bc28ae1f6081e0c02023-11-17T17:07:28ZengMDPI AGMathematics2227-73902023-03-01117156810.3390/math11071568Finding the Best DuelerZhengu Zhang0Sheldon M. Ross1Department of Industrial and Systems Engineering, University of Southern California, Los Angeles, CA 90089, USADepartment of Industrial and Systems Engineering, University of Southern California, Los Angeles, CA 90089, USAConsider a set of <i>n</i> players. We suppose that each game involves two players, that there is some unknown player who wins each game it plays with a probability greater than <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>1</mn><mo>/</mo><mn>2</mn><mo>,</mo></mrow></semantics></math></inline-formula> and that our objective is to determine this best player. Under the requirement that the policy employed guarantees a correct choice with a probability of at least some specified value, we look for a policy that has a relatively small expected number of games played before decision. We consider this problem both under the assumption that the best player wins each game with a probability of at least some specified value <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi>p</mi><mn>0</mn></msub><mo>></mo><mn>1</mn><mo>/</mo><mn>2</mn><mo>,</mo></mrow></semantics></math></inline-formula> and under a Bayesian assumption that the probability that player <i>i</i> wins a game against player <i>j</i> is <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mfrac><msub><mi>v</mi><mi>i</mi></msub><mrow><msub><mi>v</mi><mi>i</mi></msub><mo>+</mo><msub><mi>v</mi><mi>j</mi></msub></mrow></mfrac></semantics></math></inline-formula>, where <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi>v</mi><mn>1</mn></msub><mo>,</mo><mo>…</mo><mo>,</mo><msub><mi>v</mi><mi>n</mi></msub></mrow></semantics></math></inline-formula> are the unknown values of <i>n</i> independent and identically distributed exponential random variables. In the former case, we propose a policy where chosen pairs play a match that ends when one of them has had a specified number of wins more than the other; in the latter case, we propose a Thompson sampling type rule.https://www.mdpi.com/2227-7390/11/7/1568best arm identificationdueling bandit |
spellingShingle | Zhengu Zhang Sheldon M. Ross Finding the Best Dueler Mathematics best arm identification dueling bandit |
title | Finding the Best Dueler |
title_full | Finding the Best Dueler |
title_fullStr | Finding the Best Dueler |
title_full_unstemmed | Finding the Best Dueler |
title_short | Finding the Best Dueler |
title_sort | finding the best dueler |
topic | best arm identification dueling bandit |
url | https://www.mdpi.com/2227-7390/11/7/1568 |
work_keys_str_mv | AT zhenguzhang findingthebestdueler AT sheldonmross findingthebestdueler |