Summary: | This paper addresses the problem of instance-level 6DoF pose estimation from a single RGBD image in an indoor scene. Many recent works have shown that a two-stage network, which first detects the keypoints and then regresses the keypoints for 6d pose estimation, achieves remarkable performance. However, the previous methods concern little about channel-wise attention and the keypoints are not selected by comprehensive use of RGBD information, which limits the performance of the network. To enhance RGB feature representation ability, a modular Split-Attention block that enables attention across feature-map groups is proposed. In addition, by combining the Oriented FAST and Rotated BRIEF (ORB) keypoints and the Farthest Point Sample (FPS) algorithm, a simple but effective keypoint selection method named ORB-FPS is presented to avoid the keypoints appear on the non-salient regions. The proposed algorithm is tested on the Linemod and the YCB-Video dataset, the experimental results demonstrate that our method outperforms the current approaches, achieves ADD(S) accuracy of 94.5% on the Linemod dataset and 91.4% on the YCB-Video dataset.
|