Summary: | Learning automaton (LA), a powerful tool in reinforcement learning, is of crucial importance for its adaptivity in the stochastic environment and its applicability in various engineering fields. In particular, the LA adaptively explores the optimal action that maximizes the reward among all possible choices by interacting with the environment. However, the traditional frameworks for LA have several limitations in practical applications, e.g., the cost of parameter tuning and predicaments in massive-action environments, preventing them from being applied to time-sensitive and resources-restricted tasks. In this paper, we propose a novel LA framework based on the statistical hypothesis testing, where the actions are compared by statistical hypothesis iteratively and the suboptimal ones are dismissed, and the estimated optimal action is attained. Apart from the proposal, the theoretical analyses for the framework are given to reveal its e-optimality. The proposed framework also features efficiency in massive-action environments and the parameter-free property. The comprehensive simulations are conducted in both benchmark and massiveaction environments to demonstrate the superiority of the proposed framework over the ordinary schemes.
|