Zero-shot category-level object pose estimation

Object pose estimation is an important component of most vision pipelines for embodied agents, as well as in 3D vision more generally. In this paper we tackle the problem of estimating the pose of novel object categories in a zero-shot manner. This extends much of the existing literature by removing...

Πλήρης περιγραφή

Λεπτομέρειες βιβλιογραφικής εγγραφής
Κύριοι συγγραφείς: Goodwin, W, Vaze, S, Havoutis, I, Posner, I
Μορφή: Conference item
Γλώσσα:English
Έκδοση: Springer 2022
_version_ 1826309418860412928
author Goodwin, W
Vaze, S
Havoutis, I
Posner, I
author_facet Goodwin, W
Vaze, S
Havoutis, I
Posner, I
author_sort Goodwin, W
collection OXFORD
description Object pose estimation is an important component of most vision pipelines for embodied agents, as well as in 3D vision more generally. In this paper we tackle the problem of estimating the pose of novel object categories in a zero-shot manner. This extends much of the existing literature by removing the need for pose-labelled datasets or category-specific CAD models for training or inference. Specifically, we make the following contributions. First, we formalise the zero-shot, category-level pose estimation problem and frame it in a way that is most applicable to real-world embodied agents. Secondly, we propose a novel method based on semantic correspondences from a self-supervised vision transformer to solve the pose estimation problem. We further re-purpose the recent CO3D dataset to present a controlled and realistic test setting. Finally, we demonstrate that all baselines for our proposed task perform poorly, and show that our method provides a six-fold improvement in average rotation accuracy at 30 ∘ C. Our code is available at https://github.com/applied-ai-lab/zero-shot-pose.
first_indexed 2024-03-07T07:33:55Z
format Conference item
id oxford-uuid:c4443a39-1ad7-4704-a024-149893eaf5cb
institution University of Oxford
language English
last_indexed 2024-03-07T07:33:55Z
publishDate 2022
publisher Springer
record_format dspace
spelling oxford-uuid:c4443a39-1ad7-4704-a024-149893eaf5cb2023-02-17T11:08:53ZZero-shot category-level object pose estimationConference itemhttp://purl.org/coar/resource_type/c_5794uuid:c4443a39-1ad7-4704-a024-149893eaf5cbEnglishSymplectic ElementsSpringer2022Goodwin, WVaze, SHavoutis, IPosner, IObject pose estimation is an important component of most vision pipelines for embodied agents, as well as in 3D vision more generally. In this paper we tackle the problem of estimating the pose of novel object categories in a zero-shot manner. This extends much of the existing literature by removing the need for pose-labelled datasets or category-specific CAD models for training or inference. Specifically, we make the following contributions. First, we formalise the zero-shot, category-level pose estimation problem and frame it in a way that is most applicable to real-world embodied agents. Secondly, we propose a novel method based on semantic correspondences from a self-supervised vision transformer to solve the pose estimation problem. We further re-purpose the recent CO3D dataset to present a controlled and realistic test setting. Finally, we demonstrate that all baselines for our proposed task perform poorly, and show that our method provides a six-fold improvement in average rotation accuracy at 30 ∘ C. Our code is available at https://github.com/applied-ai-lab/zero-shot-pose.
spellingShingle Goodwin, W
Vaze, S
Havoutis, I
Posner, I
Zero-shot category-level object pose estimation
title Zero-shot category-level object pose estimation
title_full Zero-shot category-level object pose estimation
title_fullStr Zero-shot category-level object pose estimation
title_full_unstemmed Zero-shot category-level object pose estimation
title_short Zero-shot category-level object pose estimation
title_sort zero shot category level object pose estimation
work_keys_str_mv AT goodwinw zeroshotcategorylevelobjectposeestimation
AT vazes zeroshotcategorylevelobjectposeestimation
AT havoutisi zeroshotcategorylevelobjectposeestimation
AT posneri zeroshotcategorylevelobjectposeestimation