Summary: | We propose a dual-module robotic system to tackle the problem of generating and performing antipodal robotic grasps for unknown objects from the n-channel image of the scene. We present an improved version of the Generative Residual Convolutional Neural Network (GR-ConvNet v2) model that can generate robust antipodal grasps from n-channel image input at real-time speeds (20 ms). We evaluated the proposed model architecture on three standard datasets and achieved a new state-of-the-art accuracy of 98.8%, 95.1%, and 97.4% on Cornell, Jacquard and Graspnet grasping datasets, respectively. Empirical results show that our model significantly outperformed the prior work with a stricter IoU-based grasp detection metric. We conducted a suite of tests in simulation and the real world on a diverse set of previously unseen objects with adversarial geometry and household items. We demonstrate the adaptability of our approach by directly transferring the trained model to a 7 DoF robotic manipulator with a grasp success rate of 95.4% and 93.0% on novel household and adversarial objects, respectively. Furthermore, we validate the generalization capability of our pixel-wise grasp prediction model by validating it on complex Ravens-10 benchmark tasks, some of which require closed-loop visual feedback for multi-step sequencing.
|