Summary: | What is the “state” of a pile of objects? Collections of standard Lagrangian states ideally describe a pile, but it is impractical to estimate Lagrangian states directly from images to use for visuomotor feedback control. Given this burden of state estimation, is there a practical alternative representation that lies closer to our observations? In addition, how can we build predictive models over such representations that can be useful for their task-free generality? In the first chapter of this thesis, we investigate using the image observation directly as state, and compare different models that can be useful over this space of representations. We surprisingly find that completely linear models that describe the evolution of images outperform naive deep models, and perform in par with models that work over particle-space representations. In the next chapter, we analyze and describe the reason for this inductive bias of linear models by describing the pixel space as a space of measures, and show limitations of this approach outside of object pile manipulation. In the final chapter of this thesis, we present a more general solution to image-based control based on doing model-based Reinforcement Learning on the sufficient statistics of a task, which we call Approximate Information States (AIS). We demonstrate that when the model does not have sufficient inductive bias, model-based reinforcement learning is prone to two important pitfalls: distribution shift, and optimization exploiting model error. These problems are tackled through online learning, and risk-aware control that penalizes the variance of the model ensemble.
|