Signs in time: Encoding human motion as a temporal image

The goal of this work is to recognise and localise short temporal signals in image time series, where strong supervision is not available for training. To this end we propose an image encoding that concisely represents human motion in a video sequence in a form that is suitable for learning with a...

Full description

Bibliographic Details
Main Authors: Chung, J, Zisserman, A
Format: Conference item
Published: 2016
_version_ 1797101348503683072
author Chung, J
Zisserman, A
author_facet Chung, J
Zisserman, A
author_sort Chung, J
collection OXFORD
description The goal of this work is to recognise and localise short temporal signals in image time series, where strong supervision is not available for training. To this end we propose an image encoding that concisely represents human motion in a video sequence in a form that is suitable for learning with a ConvNet. The encoding reduces the pose information from an image to a single column, dramatically diminishing the input requirements for the network, but retaining the essential information for recognition. The encoding is applied to the task of recognizing and localising signed gestures in British Sign Language (BSL) videos. We demonstrate that using the proposed encoding, signs as short as 10 frames duration can be learnt from clips lasting hundreds of frames using only weak (clip level) supervision and with considerable label noise.
first_indexed 2024-03-07T05:50:36Z
format Conference item
id oxford-uuid:e8beb550-9e50-4bf4-96a4-7fa5a8cbd79c
institution University of Oxford
last_indexed 2024-03-07T05:50:36Z
publishDate 2016
record_format dspace
spelling oxford-uuid:e8beb550-9e50-4bf4-96a4-7fa5a8cbd79c2022-03-27T10:49:04ZSigns in time: Encoding human motion as a temporal imageConference itemhttp://purl.org/coar/resource_type/c_5794uuid:e8beb550-9e50-4bf4-96a4-7fa5a8cbd79cSymplectic Elements at Oxford2016Chung, JZisserman, AThe goal of this work is to recognise and localise short temporal signals in image time series, where strong supervision is not available for training. To this end we propose an image encoding that concisely represents human motion in a video sequence in a form that is suitable for learning with a ConvNet. The encoding reduces the pose information from an image to a single column, dramatically diminishing the input requirements for the network, but retaining the essential information for recognition. The encoding is applied to the task of recognizing and localising signed gestures in British Sign Language (BSL) videos. We demonstrate that using the proposed encoding, signs as short as 10 frames duration can be learnt from clips lasting hundreds of frames using only weak (clip level) supervision and with considerable label noise.
spellingShingle Chung, J
Zisserman, A
Signs in time: Encoding human motion as a temporal image
title Signs in time: Encoding human motion as a temporal image
title_full Signs in time: Encoding human motion as a temporal image
title_fullStr Signs in time: Encoding human motion as a temporal image
title_full_unstemmed Signs in time: Encoding human motion as a temporal image
title_short Signs in time: Encoding human motion as a temporal image
title_sort signs in time encoding human motion as a temporal image
work_keys_str_mv AT chungj signsintimeencodinghumanmotionasatemporalimage
AT zissermana signsintimeencodinghumanmotionasatemporalimage