Итог: | <p>The ability to learn from data is crucial in developing satisfactory solutions to many complex problems. In particular, in the design of intelligent agents that exist and interact with a complex environment in the pursuit of some goal. In this thesis we investigate some bottlenecks that can prevent such agents from learning and making progress in their respective environments. To do so we adopt the framework of Reinforcement Learning (RL), specifically Deep RL in which deep neural networks are used to learn and represent functions and quantities of interest.</p>
<p>The first bottleneck we investigate, is the difficulty of exploration in Part I. An agent must first explore their environment, by interacting with it to observe the consequences of its actions in a variety of situations, in order to gather enough data to infer good behaviour from. This is of particular importance in sparse reward environments, in which significant exploration is often required before any meaningful signal related to its goal is received by the agent. This is the first bottleneck we investigate in Chapter 3. Specifically, how can we incorporate the principle of optimism in the face of uncertainty that is prevalent in the tabular literature, into our Deep RL algorithms.</p>
<p>In Part II we then investigate the difficulties of learning as a team of agents. Although exploration is still of paramount importance in this multi-agent setting, we must first be able to confidently solve simpler problems in which we are not limited by our ability to explore. We consider the fully cooperative multi-agent setting, in which we wish to train a team of agents to solve a shared task. In Chapter 4 we explore a line of research into value function factorisation, which can allow for more efficient learning by considering the value function of the team as a whole in a tractable manner. We then investigate the consequences of our factorisation in Chapter 5, focussing on the negative implications they can have on learnability and exploration. To remedy these shortcomings we propose the use of a weighting in our optimisation objective. We then investigate the utility of this weighting beyond its original motivations, and identify new bottlenecks that arise in these improved approaches.</p>
<p>Finally, in Part III we consider problems in which there isn’t a well-defined objective. This is the case in games such as Chess or Go, in which the overall goal is to learn to play as well as possible. However, this goal does not explicitly specify who we should benchmark against, nor which opponents we should train against. In these settings, we can make progress by proposing our own appropriate opponent. It is this step, of proposing an appropriate opponent that we investigate further. In Chapter 6 we propose a Bayesian algorithm that can incorporate any prior knowledge we might have, and that will take advantage of the structure of the problem in order to efficiently explore the relevant quantities. Thereby drastically reducing the time and computation required to propose such an opponent.</p>
|