Transparency-Aware Segmentation of Glass Objects to Train RGB-Based Pose Estimators

Robotic manipulation requires object pose knowledge for the objects of interest. In order to perform typical household chores, a robot needs to be able to estimate 6D poses for objects such as water glasses or salad bowls. This is especially difficult for glass objects, as for these, depth data are...

Full description

Bibliographic Details
Main Authors: Maira Weidenbach, Tim Laue, Udo Frese
Format: Article
Language:English
Published: MDPI AG 2024-01-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/24/2/432
_version_ 1797339432614887424
author Maira Weidenbach
Tim Laue
Udo Frese
author_facet Maira Weidenbach
Tim Laue
Udo Frese
author_sort Maira Weidenbach
collection DOAJ
description Robotic manipulation requires object pose knowledge for the objects of interest. In order to perform typical household chores, a robot needs to be able to estimate 6D poses for objects such as water glasses or salad bowls. This is especially difficult for glass objects, as for these, depth data are mostly disturbed, and in RGB images, occluded objects are still visible. Thus, in this paper, we propose to redefine the ground-truth for training RGB-based pose estimators in two ways: (a) we apply a transparency-aware multisegmentation, in which an image pixel can belong to more than one object, and (b) we use transparency-aware bounding boxes, which always enclose whole objects, even if parts of an object are formally occluded by another object. The latter approach ensures that the size and scale of an object remain more consistent across different images. We train our pose estimator, which was originally designed for opaque objects, with three different ground-truth types on the ClearPose dataset. Just by changing the training data to our transparency-aware segmentation, with no additional glass-specific feature changes in the estimator, the ADD-S AUC value increases by 4.3%. Such a multisegmentation can be created for every dataset that provides a 3D model of the object and its ground-truth pose.
first_indexed 2024-03-08T09:47:24Z
format Article
id doaj.art-b605aa902da44bc0a2eeb2973cab6076
institution Directory Open Access Journal
issn 1424-8220
language English
last_indexed 2024-03-08T09:47:24Z
publishDate 2024-01-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj.art-b605aa902da44bc0a2eeb2973cab60762024-01-29T14:14:39ZengMDPI AGSensors1424-82202024-01-0124243210.3390/s24020432Transparency-Aware Segmentation of Glass Objects to Train RGB-Based Pose EstimatorsMaira Weidenbach0Tim Laue1Udo Frese2Faculty of Mathematics and Computer Science, University of Bremen, 28359 Bremen, GermanyFaculty of Mathematics and Computer Science, University of Bremen, 28359 Bremen, GermanyFaculty of Mathematics and Computer Science, University of Bremen, 28359 Bremen, GermanyRobotic manipulation requires object pose knowledge for the objects of interest. In order to perform typical household chores, a robot needs to be able to estimate 6D poses for objects such as water glasses or salad bowls. This is especially difficult for glass objects, as for these, depth data are mostly disturbed, and in RGB images, occluded objects are still visible. Thus, in this paper, we propose to redefine the ground-truth for training RGB-based pose estimators in two ways: (a) we apply a transparency-aware multisegmentation, in which an image pixel can belong to more than one object, and (b) we use transparency-aware bounding boxes, which always enclose whole objects, even if parts of an object are formally occluded by another object. The latter approach ensures that the size and scale of an object remain more consistent across different images. We train our pose estimator, which was originally designed for opaque objects, with three different ground-truth types on the ClearPose dataset. Just by changing the training data to our transparency-aware segmentation, with no additional glass-specific feature changes in the estimator, the ADD-S AUC value increases by 4.3%. Such a multisegmentation can be created for every dataset that provides a 3D model of the object and its ground-truth pose.https://www.mdpi.com/1424-8220/24/2/432neural networkstraining datatransparent objectsbounding boxsegmentationpose estimation
spellingShingle Maira Weidenbach
Tim Laue
Udo Frese
Transparency-Aware Segmentation of Glass Objects to Train RGB-Based Pose Estimators
Sensors
neural networks
training data
transparent objects
bounding box
segmentation
pose estimation
title Transparency-Aware Segmentation of Glass Objects to Train RGB-Based Pose Estimators
title_full Transparency-Aware Segmentation of Glass Objects to Train RGB-Based Pose Estimators
title_fullStr Transparency-Aware Segmentation of Glass Objects to Train RGB-Based Pose Estimators
title_full_unstemmed Transparency-Aware Segmentation of Glass Objects to Train RGB-Based Pose Estimators
title_short Transparency-Aware Segmentation of Glass Objects to Train RGB-Based Pose Estimators
title_sort transparency aware segmentation of glass objects to train rgb based pose estimators
topic neural networks
training data
transparent objects
bounding box
segmentation
pose estimation
url https://www.mdpi.com/1424-8220/24/2/432
work_keys_str_mv AT mairaweidenbach transparencyawaresegmentationofglassobjectstotrainrgbbasedposeestimators
AT timlaue transparencyawaresegmentationofglassobjectstotrainrgbbasedposeestimators
AT udofrese transparencyawaresegmentationofglassobjectstotrainrgbbasedposeestimators