Capturing the geometry of object categories from video supervision

In this article, we are interested in capturing the 3D geometry of object categories simply by looking around them. Our unsupervised method fundamentally departs from traditional approaches that require either CAD models or manual supervision. It only uses video sequences capturing a handful of inst...

Full description

Bibliographic Details
Main Authors:	Novotny, D, Larlus, D, Vedaldi, A
Format:	Journal article
Language:	English
Published:	Institute of Electrical and Electronics Engineers 2018

_version_	1826272375018094592
author	Novotny, D Larlus, D Vedaldi, A
author_facet	Novotny, D Larlus, D Vedaldi, A
author_sort	Novotny, D
collection	OXFORD
description	In this article, we are interested in capturing the 3D geometry of object categories simply by looking around them. Our unsupervised method fundamentally departs from traditional approaches that require either CAD models or manual supervision. It only uses video sequences capturing a handful of instances of an object category to train a deep architecture tailored for extracting 3D geometry predictions. Our deep architecture has three components. First, a Siamese viewpoint factorization network robustly aligns the input videos and, as a consequence, learns to predict the absolute category-specific viewpoint from a single image depicting any previously unseen instance of that category. Second, a depth estimation network performs monocular depth prediction. Finally, a 3D shape completion network predicts the full shape of the depicted object instance by re-using the output of the monocular depth prediction module. We also propose a way to configure networks so they can perform probabilistic predictions. We demonstrate that, properly used in our framework, this self-assessment mechanism is crucial for obtaining high quality predictions. Our network achieves state-of-the-art results on viewpoint prediction, depth estimation, and 3D point cloud estimation on public benchmarks.
first_indexed	2024-03-06T22:11:34Z
format	Journal article
id	oxford-uuid:51fa438e-ed36-4043-a6f4-a30609c9e428
institution	University of Oxford
language	English
last_indexed	2024-03-06T22:11:34Z
publishDate	2018
publisher	Institute of Electrical and Electronics Engineers
record_format	dspace
spelling	oxford-uuid:51fa438e-ed36-4043-a6f4-a30609c9e4282022-03-26T16:22:54ZCapturing the geometry of object categories from video supervisionJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:51fa438e-ed36-4043-a6f4-a30609c9e428EnglishSymplectic Elements at OxfordInstitute of Electrical and Electronics Engineers2018Novotny, DLarlus, DVedaldi, AIn this article, we are interested in capturing the 3D geometry of object categories simply by looking around them. Our unsupervised method fundamentally departs from traditional approaches that require either CAD models or manual supervision. It only uses video sequences capturing a handful of instances of an object category to train a deep architecture tailored for extracting 3D geometry predictions. Our deep architecture has three components. First, a Siamese viewpoint factorization network robustly aligns the input videos and, as a consequence, learns to predict the absolute category-specific viewpoint from a single image depicting any previously unseen instance of that category. Second, a depth estimation network performs monocular depth prediction. Finally, a 3D shape completion network predicts the full shape of the depicted object instance by re-using the output of the monocular depth prediction module. We also propose a way to configure networks so they can perform probabilistic predictions. We demonstrate that, properly used in our framework, this self-assessment mechanism is crucial for obtaining high quality predictions. Our network achieves state-of-the-art results on viewpoint prediction, depth estimation, and 3D point cloud estimation on public benchmarks.
spellingShingle	Novotny, D Larlus, D Vedaldi, A Capturing the geometry of object categories from video supervision
title	Capturing the geometry of object categories from video supervision
title_full	Capturing the geometry of object categories from video supervision
title_fullStr	Capturing the geometry of object categories from video supervision
title_full_unstemmed	Capturing the geometry of object categories from video supervision
title_short	Capturing the geometry of object categories from video supervision
title_sort	capturing the geometry of object categories from video supervision
work_keys_str_mv	AT novotnyd capturingthegeometryofobjectcategoriesfromvideosupervision AT larlusd capturingthegeometryofobjectcategoriesfromvideosupervision AT vedaldia capturingthegeometryofobjectcategoriesfromvideosupervision

Capturing the geometry of object categories from video supervision

Similar Items