Signs in time: Encoding human motion as a temporal image

The goal of this work is to recognise and localise short temporal signals in image time series, where strong supervision is not available for training. To this end we propose an image encoding that concisely represents human motion in a video sequence in a form that is suitable for learning with a...

Full description

Bibliographic Details
Main Authors:	Chung, J, Zisserman, A
Format:	Conference item
Published:	2016

_version_	1797101348503683072
author	Chung, J Zisserman, A
author_facet	Chung, J Zisserman, A
author_sort	Chung, J
collection	OXFORD
description	The goal of this work is to recognise and localise short temporal signals in image time series, where strong supervision is not available for training. To this end we propose an image encoding that concisely represents human motion in a video sequence in a form that is suitable for learning with a ConvNet. The encoding reduces the pose information from an image to a single column, dramatically diminishing the input requirements for the network, but retaining the essential information for recognition. The encoding is applied to the task of recognizing and localising signed gestures in British Sign Language (BSL) videos. We demonstrate that using the proposed encoding, signs as short as 10 frames duration can be learnt from clips lasting hundreds of frames using only weak (clip level) supervision and with considerable label noise.
first_indexed	2024-03-07T05:50:36Z
format	Conference item
id	oxford-uuid:e8beb550-9e50-4bf4-96a4-7fa5a8cbd79c
institution	University of Oxford
last_indexed	2024-03-07T05:50:36Z
publishDate	2016
record_format	dspace
spelling	oxford-uuid:e8beb550-9e50-4bf4-96a4-7fa5a8cbd79c2022-03-27T10:49:04ZSigns in time: Encoding human motion as a temporal imageConference itemhttp://purl.org/coar/resource_type/c_5794uuid:e8beb550-9e50-4bf4-96a4-7fa5a8cbd79cSymplectic Elements at Oxford2016Chung, JZisserman, AThe goal of this work is to recognise and localise short temporal signals in image time series, where strong supervision is not available for training. To this end we propose an image encoding that concisely represents human motion in a video sequence in a form that is suitable for learning with a ConvNet. The encoding reduces the pose information from an image to a single column, dramatically diminishing the input requirements for the network, but retaining the essential information for recognition. The encoding is applied to the task of recognizing and localising signed gestures in British Sign Language (BSL) videos. We demonstrate that using the proposed encoding, signs as short as 10 frames duration can be learnt from clips lasting hundreds of frames using only weak (clip level) supervision and with considerable label noise.
spellingShingle	Chung, J Zisserman, A Signs in time: Encoding human motion as a temporal image
title	Signs in time: Encoding human motion as a temporal image
title_full	Signs in time: Encoding human motion as a temporal image
title_fullStr	Signs in time: Encoding human motion as a temporal image
title_full_unstemmed	Signs in time: Encoding human motion as a temporal image
title_short	Signs in time: Encoding human motion as a temporal image
title_sort	signs in time encoding human motion as a temporal image
work_keys_str_mv	AT chungj signsintimeencodinghumanmotionasatemporalimage AT zissermana signsintimeencodinghumanmotionasatemporalimage

Signs in time: Encoding human motion as a temporal image

Similar Items