Transparency-Aware Segmentation of Glass Objects to Train RGB-Based Pose Estimators
Robotic manipulation requires object pose knowledge for the objects of interest. In order to perform typical household chores, a robot needs to be able to estimate 6D poses for objects such as water glasses or salad bowls. This is especially difficult for glass objects, as for these, depth data are...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2024-01-01
|
Series: | Sensors |
Subjects: | |
Online Access: | https://www.mdpi.com/1424-8220/24/2/432 |
_version_ | 1797339432614887424 |
---|---|
author | Maira Weidenbach Tim Laue Udo Frese |
author_facet | Maira Weidenbach Tim Laue Udo Frese |
author_sort | Maira Weidenbach |
collection | DOAJ |
description | Robotic manipulation requires object pose knowledge for the objects of interest. In order to perform typical household chores, a robot needs to be able to estimate 6D poses for objects such as water glasses or salad bowls. This is especially difficult for glass objects, as for these, depth data are mostly disturbed, and in RGB images, occluded objects are still visible. Thus, in this paper, we propose to redefine the ground-truth for training RGB-based pose estimators in two ways: (a) we apply a transparency-aware multisegmentation, in which an image pixel can belong to more than one object, and (b) we use transparency-aware bounding boxes, which always enclose whole objects, even if parts of an object are formally occluded by another object. The latter approach ensures that the size and scale of an object remain more consistent across different images. We train our pose estimator, which was originally designed for opaque objects, with three different ground-truth types on the ClearPose dataset. Just by changing the training data to our transparency-aware segmentation, with no additional glass-specific feature changes in the estimator, the ADD-S AUC value increases by 4.3%. Such a multisegmentation can be created for every dataset that provides a 3D model of the object and its ground-truth pose. |
first_indexed | 2024-03-08T09:47:24Z |
format | Article |
id | doaj.art-b605aa902da44bc0a2eeb2973cab6076 |
institution | Directory Open Access Journal |
issn | 1424-8220 |
language | English |
last_indexed | 2024-03-08T09:47:24Z |
publishDate | 2024-01-01 |
publisher | MDPI AG |
record_format | Article |
series | Sensors |
spelling | doaj.art-b605aa902da44bc0a2eeb2973cab60762024-01-29T14:14:39ZengMDPI AGSensors1424-82202024-01-0124243210.3390/s24020432Transparency-Aware Segmentation of Glass Objects to Train RGB-Based Pose EstimatorsMaira Weidenbach0Tim Laue1Udo Frese2Faculty of Mathematics and Computer Science, University of Bremen, 28359 Bremen, GermanyFaculty of Mathematics and Computer Science, University of Bremen, 28359 Bremen, GermanyFaculty of Mathematics and Computer Science, University of Bremen, 28359 Bremen, GermanyRobotic manipulation requires object pose knowledge for the objects of interest. In order to perform typical household chores, a robot needs to be able to estimate 6D poses for objects such as water glasses or salad bowls. This is especially difficult for glass objects, as for these, depth data are mostly disturbed, and in RGB images, occluded objects are still visible. Thus, in this paper, we propose to redefine the ground-truth for training RGB-based pose estimators in two ways: (a) we apply a transparency-aware multisegmentation, in which an image pixel can belong to more than one object, and (b) we use transparency-aware bounding boxes, which always enclose whole objects, even if parts of an object are formally occluded by another object. The latter approach ensures that the size and scale of an object remain more consistent across different images. We train our pose estimator, which was originally designed for opaque objects, with three different ground-truth types on the ClearPose dataset. Just by changing the training data to our transparency-aware segmentation, with no additional glass-specific feature changes in the estimator, the ADD-S AUC value increases by 4.3%. Such a multisegmentation can be created for every dataset that provides a 3D model of the object and its ground-truth pose.https://www.mdpi.com/1424-8220/24/2/432neural networkstraining datatransparent objectsbounding boxsegmentationpose estimation |
spellingShingle | Maira Weidenbach Tim Laue Udo Frese Transparency-Aware Segmentation of Glass Objects to Train RGB-Based Pose Estimators Sensors neural networks training data transparent objects bounding box segmentation pose estimation |
title | Transparency-Aware Segmentation of Glass Objects to Train RGB-Based Pose Estimators |
title_full | Transparency-Aware Segmentation of Glass Objects to Train RGB-Based Pose Estimators |
title_fullStr | Transparency-Aware Segmentation of Glass Objects to Train RGB-Based Pose Estimators |
title_full_unstemmed | Transparency-Aware Segmentation of Glass Objects to Train RGB-Based Pose Estimators |
title_short | Transparency-Aware Segmentation of Glass Objects to Train RGB-Based Pose Estimators |
title_sort | transparency aware segmentation of glass objects to train rgb based pose estimators |
topic | neural networks training data transparent objects bounding box segmentation pose estimation |
url | https://www.mdpi.com/1424-8220/24/2/432 |
work_keys_str_mv | AT mairaweidenbach transparencyawaresegmentationofglassobjectstotrainrgbbasedposeestimators AT timlaue transparencyawaresegmentationofglassobjectstotrainrgbbasedposeestimators AT udofrese transparencyawaresegmentationofglassobjectstotrainrgbbasedposeestimators |