Occlusion-Robust Pallet Pose Estimation for Warehouse Automation

Accurate detection and estimation of pallet poses from color and depth data (RGB-D) are integral components many in advanced warehouse intelligent systems. State-of-the art object pose estimation methods follow a two-stage process, relying on off-the-shelf segmentation or object detection in the ini...

Full description

Bibliographic Details
Main Authors: Van-Duc Vu, Dinh-Dai Hoang, Phan Xuan Tan, Van-Thiep Nguyen, Thu-Uyen Nguyen, Ngoc-Anh Hoang, Khanh-Toan Phan, Duc-Thanh Tran, Duy-Quang Vu, Phuc-Quan Ngo, Quang-Tri Duong, Anh-Nhat Nguyen, Dinh-Cuong Hoang
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10378693/
_version_ 1797361442249244672
author Van-Duc Vu
Dinh-Dai Hoang
Phan Xuan Tan
Van-Thiep Nguyen
Thu-Uyen Nguyen
Ngoc-Anh Hoang
Khanh-Toan Phan
Duc-Thanh Tran
Duy-Quang Vu
Phuc-Quan Ngo
Quang-Tri Duong
Anh-Nhat Nguyen
Dinh-Cuong Hoang
author_facet Van-Duc Vu
Dinh-Dai Hoang
Phan Xuan Tan
Van-Thiep Nguyen
Thu-Uyen Nguyen
Ngoc-Anh Hoang
Khanh-Toan Phan
Duc-Thanh Tran
Duy-Quang Vu
Phuc-Quan Ngo
Quang-Tri Duong
Anh-Nhat Nguyen
Dinh-Cuong Hoang
author_sort Van-Duc Vu
collection DOAJ
description Accurate detection and estimation of pallet poses from color and depth data (RGB-D) are integral components many in advanced warehouse intelligent systems. State-of-the art object pose estimation methods follow a two-stage process, relying on off-the-shelf segmentation or object detection in the initial stage and subsequently predicting the pose of objects using cropped images. The cropped patches may include both the target object and irrelevant information, such as background or other objects, leading to challenges in handling pallets in warehouse settings with heavy occlusions from loaded objects. In this study, we propose an innovative deep learning-based approach to address the occlusion problem in pallet pose estimation from RGB-D images. Inspired by the selective attention mechanism in human perception, our developed model learns to identify and attenuate the significance of features in occluded regions, focusing on the visible and informative areas for accurate pose estimation. Instead of directly estimating pallet poses from cropped patches as in existing methods, we introduce two feature map re-weighting modules with cross-modal attention. These modules effectively filter out features from occluded regions and background, enhancing pose estimation accuracy. Furthermore, we introduce a large-scale annotated pallet dataset specifically designed to capture occlusion scenarios in warehouse environments, facilitating comprehensive training and evaluation. Experimental results on the newly collected pallet dataset show that our proposed method increases accuracy by 13.5% compared to state-of-the-art methods.
first_indexed 2024-03-08T15:53:45Z
format Article
id doaj.art-85818809aecf417c88c5ae9dd093fb5d
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-03-08T15:53:45Z
publishDate 2024-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-85818809aecf417c88c5ae9dd093fb5d2024-01-09T00:04:59ZengIEEEIEEE Access2169-35362024-01-01121927194210.1109/ACCESS.2023.334878110378693Occlusion-Robust Pallet Pose Estimation for Warehouse AutomationVan-Duc Vu0Dinh-Dai Hoang1Phan Xuan Tan2https://orcid.org/0000-0002-9592-0226Van-Thiep Nguyen3Thu-Uyen Nguyen4Ngoc-Anh Hoang5Khanh-Toan Phan6Duc-Thanh Tran7Duy-Quang Vu8https://orcid.org/0009-0008-8349-454XPhuc-Quan Ngo9Quang-Tri Duong10Anh-Nhat Nguyen11Dinh-Cuong Hoang12https://orcid.org/0000-0001-6058-2426ICT Department, FPT University, Hanoi, VietnamToyohashi University of Technology, Toyohashi, JapanCollege of Engineering, Shibaura Institute of Technology, Tokyo, JapanICT Department, FPT University, Hanoi, VietnamICT Department, FPT University, Hanoi, VietnamICT Department, FPT University, Hanoi, VietnamICT Department, FPT University, Hanoi, VietnamICT Department, FPT University, Hanoi, VietnamICT Department, FPT University, Hanoi, VietnamICT Department, FPT University, Hanoi, VietnamICT Department, FPT University, Hanoi, VietnamICT Department, FPT University, Hanoi, VietnamICT Department, FPT University, Hanoi, VietnamAccurate detection and estimation of pallet poses from color and depth data (RGB-D) are integral components many in advanced warehouse intelligent systems. State-of-the art object pose estimation methods follow a two-stage process, relying on off-the-shelf segmentation or object detection in the initial stage and subsequently predicting the pose of objects using cropped images. The cropped patches may include both the target object and irrelevant information, such as background or other objects, leading to challenges in handling pallets in warehouse settings with heavy occlusions from loaded objects. In this study, we propose an innovative deep learning-based approach to address the occlusion problem in pallet pose estimation from RGB-D images. Inspired by the selective attention mechanism in human perception, our developed model learns to identify and attenuate the significance of features in occluded regions, focusing on the visible and informative areas for accurate pose estimation. Instead of directly estimating pallet poses from cropped patches as in existing methods, we introduce two feature map re-weighting modules with cross-modal attention. These modules effectively filter out features from occluded regions and background, enhancing pose estimation accuracy. Furthermore, we introduce a large-scale annotated pallet dataset specifically designed to capture occlusion scenarios in warehouse environments, facilitating comprehensive training and evaluation. Experimental results on the newly collected pallet dataset show that our proposed method increases accuracy by 13.5% compared to state-of-the-art methods.https://ieeexplore.ieee.org/document/10378693/Pose estimationrobot vision systemsintelligent systemsdeep learningsupervised learningmachine vision
spellingShingle Van-Duc Vu
Dinh-Dai Hoang
Phan Xuan Tan
Van-Thiep Nguyen
Thu-Uyen Nguyen
Ngoc-Anh Hoang
Khanh-Toan Phan
Duc-Thanh Tran
Duy-Quang Vu
Phuc-Quan Ngo
Quang-Tri Duong
Anh-Nhat Nguyen
Dinh-Cuong Hoang
Occlusion-Robust Pallet Pose Estimation for Warehouse Automation
IEEE Access
Pose estimation
robot vision systems
intelligent systems
deep learning
supervised learning
machine vision
title Occlusion-Robust Pallet Pose Estimation for Warehouse Automation
title_full Occlusion-Robust Pallet Pose Estimation for Warehouse Automation
title_fullStr Occlusion-Robust Pallet Pose Estimation for Warehouse Automation
title_full_unstemmed Occlusion-Robust Pallet Pose Estimation for Warehouse Automation
title_short Occlusion-Robust Pallet Pose Estimation for Warehouse Automation
title_sort occlusion robust pallet pose estimation for warehouse automation
topic Pose estimation
robot vision systems
intelligent systems
deep learning
supervised learning
machine vision
url https://ieeexplore.ieee.org/document/10378693/
work_keys_str_mv AT vanducvu occlusionrobustpalletposeestimationforwarehouseautomation
AT dinhdaihoang occlusionrobustpalletposeestimationforwarehouseautomation
AT phanxuantan occlusionrobustpalletposeestimationforwarehouseautomation
AT vanthiepnguyen occlusionrobustpalletposeestimationforwarehouseautomation
AT thuuyennguyen occlusionrobustpalletposeestimationforwarehouseautomation
AT ngocanhhoang occlusionrobustpalletposeestimationforwarehouseautomation
AT khanhtoanphan occlusionrobustpalletposeestimationforwarehouseautomation
AT ducthanhtran occlusionrobustpalletposeestimationforwarehouseautomation
AT duyquangvu occlusionrobustpalletposeestimationforwarehouseautomation
AT phucquanngo occlusionrobustpalletposeestimationforwarehouseautomation
AT quangtriduong occlusionrobustpalletposeestimationforwarehouseautomation
AT anhnhatnguyen occlusionrobustpalletposeestimationforwarehouseautomation
AT dinhcuonghoang occlusionrobustpalletposeestimationforwarehouseautomation