Copeland dueling bandits

A version of the dueling bandit problem is addressed in which a Condorcet winner may not exist. Two algorithms are proposed that instead seek to minimize regret with respect to the Copeland winner, which, unlike the Condorcet winner, is guaranteed to exist. The first, Copeland Confidence Bound (CCB)...

Täydet tiedot

Bibliografiset tiedot
Päätekijät: Zoghi, M, Karnin, Z, Whiteson, S, Rijke, M
Aineistotyyppi: Conference item
Julkaistu: 2015

Samankaltaisia teoksia