Slim DensePose: Thrifty learning from sparse annotations and motion cues

DensePose supersedes traditional landmark detectors by densely mapping image pixels to body surface coordinates. This power, however, comes at a greatly increased annotation time, as supervising the model requires to manually label hundreds of points per pose instance. In this work, we thus seek met...

Full description

Bibliographic Details
Main Authors:	Neverova, N, Thewlis, J, Gűler, R, Kokkinos, I, Vedaldi, A
Format:	Conference item
Language:	English
Published:	IEEE 2020

_version_	1826294538027663360
author	Neverova, N Thewlis, J Gűler, R Kokkinos, I Vedaldi, A
author_facet	Neverova, N Thewlis, J Gűler, R Kokkinos, I Vedaldi, A
author_sort	Neverova, N
collection	OXFORD
description	DensePose supersedes traditional landmark detectors by densely mapping image pixels to body surface coordinates. This power, however, comes at a greatly increased annotation time, as supervising the model requires to manually label hundreds of points per pose instance. In this work, we thus seek methods to significantly slim down the DensePose annotations, proposing more efficient data collection strategies. In particular, we demonstrate that if annotations are collected in video frames, their efficacy can be multiplied for free by using motion cues. To explore this idea, we introduce DensePose-Track, a dataset of videos where selected frames are annotated in the traditional DensePose manner. Then, building on geometric properties of the DensePose mapping, we use the video dynamic to propagate ground-truth annotations in time as well as to learn from Siamese equivariance constraints. Having performed exhaustive empirical evaluation of various data annotation and learning strategies, we demonstrate that doing so can deliver significantly improved pose estimation results over strong baselines. However, despite what is suggested by some recent works, we show that merely synthesizing motion patterns by applying geometric transformations to isolated frames is significantly less effective, and that motion cues help much more when they are extracted from videos.
first_indexed	2024-03-07T03:47:12Z
format	Conference item
id	oxford-uuid:bfe2fd43-7f5c-4897-a303-9a66d5f82a10
institution	University of Oxford
language	English
last_indexed	2024-03-07T03:47:12Z
publishDate	2020
publisher	IEEE
record_format	dspace
spelling	oxford-uuid:bfe2fd43-7f5c-4897-a303-9a66d5f82a102022-03-27T05:50:53ZSlim DensePose: Thrifty learning from sparse annotations and motion cuesConference itemhttp://purl.org/coar/resource_type/c_5794uuid:bfe2fd43-7f5c-4897-a303-9a66d5f82a10EnglishSymplectic Elements at OxfordIEEE2020Neverova, NThewlis, JGűler, RKokkinos, IVedaldi, ADensePose supersedes traditional landmark detectors by densely mapping image pixels to body surface coordinates. This power, however, comes at a greatly increased annotation time, as supervising the model requires to manually label hundreds of points per pose instance. In this work, we thus seek methods to significantly slim down the DensePose annotations, proposing more efficient data collection strategies. In particular, we demonstrate that if annotations are collected in video frames, their efficacy can be multiplied for free by using motion cues. To explore this idea, we introduce DensePose-Track, a dataset of videos where selected frames are annotated in the traditional DensePose manner. Then, building on geometric properties of the DensePose mapping, we use the video dynamic to propagate ground-truth annotations in time as well as to learn from Siamese equivariance constraints. Having performed exhaustive empirical evaluation of various data annotation and learning strategies, we demonstrate that doing so can deliver significantly improved pose estimation results over strong baselines. However, despite what is suggested by some recent works, we show that merely synthesizing motion patterns by applying geometric transformations to isolated frames is significantly less effective, and that motion cues help much more when they are extracted from videos.
spellingShingle	Neverova, N Thewlis, J Gűler, R Kokkinos, I Vedaldi, A Slim DensePose: Thrifty learning from sparse annotations and motion cues
title	Slim DensePose: Thrifty learning from sparse annotations and motion cues
title_full	Slim DensePose: Thrifty learning from sparse annotations and motion cues
title_fullStr	Slim DensePose: Thrifty learning from sparse annotations and motion cues
title_full_unstemmed	Slim DensePose: Thrifty learning from sparse annotations and motion cues
title_short	Slim DensePose: Thrifty learning from sparse annotations and motion cues
title_sort	slim densepose thrifty learning from sparse annotations and motion cues
work_keys_str_mv	AT neverovan slimdenseposethriftylearningfromsparseannotationsandmotioncues AT thewlisj slimdenseposethriftylearningfromsparseannotationsandmotioncues AT gulerr slimdenseposethriftylearningfromsparseannotationsandmotioncues AT kokkinosi slimdenseposethriftylearningfromsparseannotationsandmotioncues AT vedaldia slimdenseposethriftylearningfromsparseannotationsandmotioncues

Slim DensePose: Thrifty learning from sparse annotations and motion cues

Similar Items