Summary: | Model-free deep RL algorithms, rooted in the concept of tabula rasa, suffer from poor
sample efficiency, which is a major drawback for these methods to be applicable to
real-world robotics problems. To improve sample efficiency, curriculum learning leverages
prior knowledge in a way to solve difficult task setups with the policy obtained from
simpler ones. Being a more systematic way of learning compared to starting from scratch,
an important point to tackle is on how to order a set of learning tasks. It is not trivial
to decide upon a task difficulty concerning robotic learning setups. For this purpose,
we present, in this thesis, three contributions in terms of algorithms for developing a
curriculum involving multiple tasks of varying difficulties. Our first contribution is an
approach that generates a curriculum in a parameter-space task setup with a goal task
available for the agent to learn. A double-inverted pendulum setup is used where a
two-dimensional parameter space is created based on the lengths of the two pendulum
links. By parameterizing task difficulty in terms of pendulum link lengths, our algorithm
starts from the pendulum with the shortest links and creates a curriculum by finding
intermediate tasks from the parameter space to learn. Results showed that transferring
the start task policy to even a single intermediate task before tackling the goal task is more
sample efficient compared to both a direct transfer from start to goal task by two-fold and
learning the goal task from scratch by ten-fold. Our second and third contributions are
oriented towards environment setups where several tasks are to be learned by the agent
with minimal assumptions on task difficulty, with tasks related to robotic setups, such as
grasping objects of varying difficulty with a robotic manipulator. The second curriculum
learning algorithm is based on the idea of selecting tasks that are neither too easy nor too
difficult, i.e. medium difficulty, so as to maintain positive learning progress. Evaluation
score is based on successful grasps from a number of trials. Medium difficulty tasks are
selected, based on their learning evaluation scores against the upper and lower limits
for difficulty assessment, and then assigned to the agent. The agent continues learning
the same task if a greater evaluation score is obtained after learning, meaning that the
agent is making positive learning progress with the task and could eventually achieve a
sufficiently high evaluation score such that the task could be marked as learned. We apply
this approach in a robotic grasping environment with objects of various geometric shapes
and show that the manipulator can learn to grasp difficult objects with our curriculum
learning approach, but not when learning from scratch (within a reasonable amount of
time). Our third contribution is an approach, named GloCAL, which is oriented towards
a generalization framework where the aim is to achieve few-shot learning for tasks that
are similar yet novel for the agent. This algorithm labels tasks as global or local based
on clustering of their learning evaluation scores. The global tasks are responsible for
transferring the policy from simple to difficult tasks, as well as being the prior for
similar yet novel tasks. We compare this approach with a recent curriculum learning
algorithm named ALP-GMM and a randomly generated curriculum in a robotic grasping
environment and show its generalization capabilities with similar yet novel objects that did
not exist while generating the curriculum. Results showed that our GloCAL algorithm
can not only surpass existing curriculum learning methods in terms of effectiveness by
grasping 100% of objects while others converge to 86% despite 1.5x training time, but
also perform few-shot transfer to similar yet novel objects.
|