Point Cloud Hand–Object Segmentation Using Multimodal Imaging with Thermal and Color Data for Safe Robotic Object Handover

This paper presents an application of neural networks operating on multimodal 3D data (3D point cloud, RGB, thermal) to effectively and precisely segment human hands and objects held in hand to realize a safe human–robot object handover. We discuss the problems encountered in building a multimodal s...

Full description

Bibliographic Details
Main Authors: Yan Zhang, Steffen Müller, Benedict Stephan, Horst-Michael Gross, Gunther Notni
Format: Article
Language:English
Published: MDPI AG 2021-08-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/21/16/5676
Description
Summary:This paper presents an application of neural networks operating on multimodal 3D data (3D point cloud, RGB, thermal) to effectively and precisely segment human hands and objects held in hand to realize a safe human–robot object handover. We discuss the problems encountered in building a multimodal sensor system, while the focus is on the calibration and alignment of a set of cameras including RGB, thermal, and NIR cameras. We propose the use of a copper–plastic chessboard calibration target with an internal active light source (near-infrared and visible light). By brief heating, the calibration target could be simultaneously and legibly captured by all cameras. Based on the multimodal dataset captured by our sensor system, PointNet, PointNet++, and RandLA-Net are utilized to verify the effectiveness of applying multimodal point cloud data for hand–object segmentation. These networks were trained on various data modes (XYZ, XYZ-T, XYZ-RGB, and XYZ-RGB-T). The experimental results show a significant improvement in the segmentation performance of XYZ-RGB-T (mean Intersection over Union: <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>82.8</mn><mo>%</mo></mrow></semantics></math></inline-formula> by RandLA-Net) compared with the other three modes (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>77.3</mn><mo>%</mo></mrow></semantics></math></inline-formula> by XYZ-RGB, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>35.7</mn><mo>%</mo></mrow></semantics></math></inline-formula> by XYZ-T, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>35.7</mn><mo>%</mo></mrow></semantics></math></inline-formula> by XYZ), in which it is worth mentioning that the Intersection over Union for the single class of hand achieves <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>92.6</mn><mo>%</mo></mrow></semantics></math></inline-formula>.
ISSN:1424-8220