Beta Upper Confidence Bound Policy for the Design of Clinical Trials

The multi-armed bandit problem is a classic example of the exploration-exploitation trade-off well suited to model sequential resource allocation under uncertainty. One of its typical motivating applications is the adaptive designs in clinical trials which modify the trial's course in accordan...

Full description

Bibliographic Details
Main Authors: Andrii Dzhoha, Iryna Rozora
Format: Article
Language:English
Published: Austrian Statistical Society 2023-08-01
Series:Austrian Journal of Statistics
Online Access:https://www.ajs.or.at/index.php/ajs/article/view/1751
Description
Summary:The multi-armed bandit problem is a classic example of the exploration-exploitation trade-off well suited to model sequential resource allocation under uncertainty. One of its typical motivating applications is the adaptive designs in clinical trials which modify the trial's course in accordance with the pre-specified objective by utilizing results accumulating in the trial. Since the response to a procedure in clinical trials is not immediate, the multi-armed bandit policies require adaptation to delays to retain their theoretical guarantees. In this work, we show the importance of such adaptation by evaluating policies using the publicly available dataset The International Stroke Trial of a randomized trial of aspirin and subcutaneous heparin among 19,435 patients with acute ischaemic stroke. In addition to adapted policies, we analyze the Upper Confidence Bound policy with the beta feedback to mitigate delays when the certainty evidence of successful treatment is available in a relatively short-term period after the procedure.
ISSN:1026-597X