Summary: | Due to the difficulty in generating a 6-Degree-of-Freedom (6-DoF) object pose estimation dataset, and the existence of domain gaps between synthetic and real data, existing pose estimation methods face challenges in improving accuracy and generalization. This paper proposes a methodology that employs higher quality datasets and deep learning-based methods to reduce the problem of domain gaps between synthetic and real data and enhance the accuracy of pose estimation. The high-quality dataset is obtained from Blenderproc and it is innovatively processed using bilateral filtering to reduce the gap. A novel attention-based mask region-based convolutional neural network (R-CNN) is proposed to reduce the computation cost and improve the model detection accuracy. Meanwhile, an improved feature pyramidal network (iFPN) is achieved by adding a layer of bottom-up paths to extract the internalization of features of the underlying layer. Consequently, a novel convolutional block attention module–convolutional denoising autoencoder (CBAM–CDAE) network is proposed by presenting channel attention and spatial attention mechanisms to improve the ability of AE to extract images’ features. Finally, an accurate 6-DoF object pose is obtained through pose refinement. The proposed approach is compared to other models using the T-LESS and LineMOD datasets. Comparison results demonstrate the proposed approach outperforms the other estimation models.
|