Representing Unstructured Environments for Robotic Manipulation: Toward Generalization, Dexterity and Robustness

We would like to have highly useful robot manipulators that can handle a diversity of objects/environments, perform challenging manipulation tasks while being sufficiently robust such that deployment at scale is feasible. This thesis aims at such a generalizable, dexterous and robust manipulation pi...

Full description

Bibliographic Details
Main Author: Gao, Wei
Other Authors: Tedrake, Russ
Format: Thesis
Published: Massachusetts Institute of Technology 2022
Online Access:https://hdl.handle.net/1721.1/140017
Description
Summary:We would like to have highly useful robot manipulators that can handle a diversity of objects/environments, perform challenging manipulation tasks while being sufficiently robust such that deployment at scale is feasible. This thesis aims at such a generalizable, dexterous and robust manipulation pipeline. At the core of our approach is the representation of the environment. In particular, how should we represent the unstructured world such that it is useful for: 1) developing a capable manipulation pipeline; 2) performing a thorough robustness evaluation of it. To answer question 1), we propose the keypoint affordance, a novel object representation consists of 3D semantic keypoints. Existing works typically use 6 Degree-of-Freedom (DOF) poses to represent the manipulated objects. However, representing an object with a parameterized transformation defined on a fixed template cannot handle large shape mismatches among different objects. In contrast, our keypoint representation captures task-related geometric information while ignoring irrelevant details, which enables the generalization to unknown objects. We implement perception, planning and feedback control modules on top of the keypoint representation and integrate them into a fully functional perception-to-action manipulation pipeline. The second part of this thesis studies the pipeline robustness and attempts to answer the question 2). Due to the infeasibility of a parametric (pose-based) object representation, we do not have a continuous input domain for investigating how the object geometry impacts the robustness, which is a prerequisite for existing methods. To address this challenge, we model factors that affect the robustness as a structured distribution over variables (e.g. the camera pose), combined with an empirical distribution, that describes visual properties (e.g. the object geometry/texture). We then formulate the robustness evaluation as a failure rate estimation problem on this combined distribution and propose an efficient graph-based algorithm to solve it. Our formulation is applied to the developed manipulation pipeline, and it can benefit many other cyber-physical systems, such as autonomous cars.