Summary: | Learning control will enable the deployment of autonomous robots in unstructured real-world settings. Solving the associated complex decision processes under real-time constraints will require intuition, guiding current actions by prior experience to anticipate long-horizon environment interactions and integrating with optimal control to ground action selection in short-horizon system constraints. Ensuring tractability of the underlying learning process is conditional upon maximizing task-aligned information extracted from environment interactions while minimizing the required guidance via human interventions. In this thesis, we develop novel learning control algorithms that enable efficient acquisition of complex behaviors while limiting prior knowledge, direct human supervision, and computational requirements. Our study focuses on learning from interaction through reinforcement learning, combining insights from model-free, model-based, and hierarchical techniques. We design decoupled discrete policy structures to yield memory-efficient agent representations. Our study demonstrates the competitive performance of critic-only agents on continuous control tasks, highlighting accelerated information propagation and exploration benefits. We further leverage hierarchical abstraction over diverse behavior components to enable time-efficient optimization. Our methods jointly learn heterogeneous low-level controller parameterizations via mixture policies for single-agent control while decoupling multi-timescale strategic from reactive reasoning in the context of multi-agent team coordination. We lastly build latent world models for multi-step reasoning and sample efficient interaction selection. Our work employs uncertainty over expected long-term returns for targeted deep exploration and constructs multi-agent interaction models to accelerate competitive behavior learning via self-play in imagination. In sum, this thesis develops scalable and efficient robot learning algorithms by addressing representational challenges across layers of abstractions, providing agents with an intrinsic ability to set implicit exploration goals under high-level guidance, and facilitating information propagation in limited data regimes.
|