Exploiting temporal context for 3D human pose estimation in the wild

We present a bundle-adjustment-based algorithm for recovering accurate 3D human pose and meshes from monocular videos. Unlike previous algorithms which operate on single frames, we show that reconstructing a person over an entire sequence gives extra constraints that can resolve ambiguities. This is...

Full description

Bibliographic Details
Main Authors:	Arnab, A, Doersch, C, Zisserman, A
Format:	Conference item
Language:	English
Published:	IEEE 2020

_version_	1826299628365021184
author	Arnab, A Doersch, C Zisserman, A
author_facet	Arnab, A Doersch, C Zisserman, A
author_sort	Arnab, A
collection	OXFORD
description	We present a bundle-adjustment-based algorithm for recovering accurate 3D human pose and meshes from monocular videos. Unlike previous algorithms which operate on single frames, we show that reconstructing a person over an entire sequence gives extra constraints that can resolve ambiguities. This is because videos often give multiple views of a person, yet the overall body shape does not change and 3D positions vary slowly. Our method improves not only on standard mocap-based datasets like Human 3.6M -- where we show quantitative improvements -- but also on challenging in-the-wild datasets such as Kinetics. Building upon our algorithm, we present a new dataset of more than 3 million frames of YouTube videos from Kinetics with automatically generated 3D poses and meshes. We show that retraining a single-frame 3D pose estimator on this data improves accuracy on both real-world and mocap data by evaluating on the 3DPW and HumanEVA datasets.
first_indexed	2024-03-07T05:04:50Z
format	Conference item
id	oxford-uuid:d990cf97-bf69-4798-ba74-c31763b58a25
institution	University of Oxford
language	English
last_indexed	2024-03-07T05:04:50Z
publishDate	2020
publisher	IEEE
record_format	dspace
spelling	oxford-uuid:d990cf97-bf69-4798-ba74-c31763b58a252022-03-27T08:56:45ZExploiting temporal context for 3D human pose estimation in the wildConference itemhttp://purl.org/coar/resource_type/c_5794uuid:d990cf97-bf69-4798-ba74-c31763b58a25EnglishSymplectic ElementsIEEE2020Arnab, ADoersch, CZisserman, AWe present a bundle-adjustment-based algorithm for recovering accurate 3D human pose and meshes from monocular videos. Unlike previous algorithms which operate on single frames, we show that reconstructing a person over an entire sequence gives extra constraints that can resolve ambiguities. This is because videos often give multiple views of a person, yet the overall body shape does not change and 3D positions vary slowly. Our method improves not only on standard mocap-based datasets like Human 3.6M -- where we show quantitative improvements -- but also on challenging in-the-wild datasets such as Kinetics. Building upon our algorithm, we present a new dataset of more than 3 million frames of YouTube videos from Kinetics with automatically generated 3D poses and meshes. We show that retraining a single-frame 3D pose estimator on this data improves accuracy on both real-world and mocap data by evaluating on the 3DPW and HumanEVA datasets.
spellingShingle	Arnab, A Doersch, C Zisserman, A Exploiting temporal context for 3D human pose estimation in the wild
title	Exploiting temporal context for 3D human pose estimation in the wild
title_full	Exploiting temporal context for 3D human pose estimation in the wild
title_fullStr	Exploiting temporal context for 3D human pose estimation in the wild
title_full_unstemmed	Exploiting temporal context for 3D human pose estimation in the wild
title_short	Exploiting temporal context for 3D human pose estimation in the wild
title_sort	exploiting temporal context for 3d human pose estimation in the wild
work_keys_str_mv	AT arnaba exploitingtemporalcontextfor3dhumanposeestimationinthewild AT doerschc exploitingtemporalcontextfor3dhumanposeestimationinthewild AT zissermana exploitingtemporalcontextfor3dhumanposeestimationinthewild

Exploiting temporal context for 3D human pose estimation in the wild

Similar Items