Exploiting signed TV broadcasts for automatic learning of British Sign Language

In this work, we will present several contributions towards automatic recognition of BSL signs from continuous signing video sequences. Specifically, we will address 3 main points: (i) automatic detection and tracking of the hands using a generative model of the image; (ii) automatic learning of sig...

Full description

Bibliographic Details
Main Authors:	Buehler, P, Everingham, M, Zisserman, A
Format:	Conference item
Language:	English
Published:	European Language Resources Association 2010

_version_	1824459090509692928
author	Buehler, P Everingham, M Zisserman, A
author_facet	Buehler, P Everingham, M Zisserman, A
author_sort	Buehler, P
collection	OXFORD
description	In this work, we will present several contributions towards automatic recognition of BSL signs from continuous signing video sequences. Specifically, we will address 3 main points: (i) automatic detection and tracking of the hands using a generative model of the image; (ii) automatic learning of signs from TV broadcasts of single signers, using only the supervisory information available from subtitles; and (iii) discriminative signer-independent sign recognition using automatically extracted training data from a single signer. <br> Our source material consists of many hours of video with continuous signing and corresponding subtitles recorded from BBC digital television. This is very challenging material for a number of reasons, including self-occlusions of the signer, self-shadowing, blur due to the speed of motion, and in particular the changing background. <br> Knowledge of the hand position and hand shape is a pre-requisite for automatic sign language recognition. We cast the problem of detecting and tracking the hands as inference in a generative model of the image, and propose a complete model which accounts for the positions and self-occlusions of the arms. Reasonable configurations are obtained by efficiently sampling from a pictorial structure proposal distribution. The results using our method exceed the state-of-the-art for the length and stability of continuous limb tracking. Previous research in sign language recognition has typically required manual training data to be generated for each sign, e.g. a signer performing each sign in controlled conditions - a time-consuming and expensive procedure. We show that for a given signer, a large number of BSL signs can be learned automatically from TV broadcasts using the supervisory information available from subtitles broadcast simultaneously with the signing. We achieve this by modelling the problem as one of multiple instance learning. In this way we are able to extract the sign of interest from hours of signing footage, despite the very weak and "noisy" supervision from the subtitles. <br> Lastly, we will show how the automatic recognition of signs can be extended to multiple signers. Using automatically extracted examples from a single signer we train discriminative classifiers and show that these can successfully recognize signs for unseen signers. This demonstrates that our features (hand trajectory and hand shape) generalise well across different signers, despite the significant inter-personal differences in signing.
first_indexed	2025-02-19T04:36:15Z
format	Conference item
id	oxford-uuid:63c55663-4308-41d5-9247-c26e496d7f73
institution	University of Oxford
language	English
last_indexed	2025-02-19T04:36:15Z
publishDate	2010
publisher	European Language Resources Association
record_format	dspace
spelling	oxford-uuid:63c55663-4308-41d5-9247-c26e496d7f732025-01-30T11:16:08ZExploiting signed TV broadcasts for automatic learning of British Sign LanguageConference itemhttp://purl.org/coar/resource_type/c_5794uuid:63c55663-4308-41d5-9247-c26e496d7f73EnglishSymplectic ElementsEuropean Language Resources Association2010Buehler, PEveringham, MZisserman, AIn this work, we will present several contributions towards automatic recognition of BSL signs from continuous signing video sequences. Specifically, we will address 3 main points: (i) automatic detection and tracking of the hands using a generative model of the image; (ii) automatic learning of signs from TV broadcasts of single signers, using only the supervisory information available from subtitles; and (iii) discriminative signer-independent sign recognition using automatically extracted training data from a single signer. <br> Our source material consists of many hours of video with continuous signing and corresponding subtitles recorded from BBC digital television. This is very challenging material for a number of reasons, including self-occlusions of the signer, self-shadowing, blur due to the speed of motion, and in particular the changing background. <br> Knowledge of the hand position and hand shape is a pre-requisite for automatic sign language recognition. We cast the problem of detecting and tracking the hands as inference in a generative model of the image, and propose a complete model which accounts for the positions and self-occlusions of the arms. Reasonable configurations are obtained by efficiently sampling from a pictorial structure proposal distribution. The results using our method exceed the state-of-the-art for the length and stability of continuous limb tracking. Previous research in sign language recognition has typically required manual training data to be generated for each sign, e.g. a signer performing each sign in controlled conditions - a time-consuming and expensive procedure. We show that for a given signer, a large number of BSL signs can be learned automatically from TV broadcasts using the supervisory information available from subtitles broadcast simultaneously with the signing. We achieve this by modelling the problem as one of multiple instance learning. In this way we are able to extract the sign of interest from hours of signing footage, despite the very weak and "noisy" supervision from the subtitles. <br> Lastly, we will show how the automatic recognition of signs can be extended to multiple signers. Using automatically extracted examples from a single signer we train discriminative classifiers and show that these can successfully recognize signs for unseen signers. This demonstrates that our features (hand trajectory and hand shape) generalise well across different signers, despite the significant inter-personal differences in signing.
spellingShingle	Buehler, P Everingham, M Zisserman, A Exploiting signed TV broadcasts for automatic learning of British Sign Language
title	Exploiting signed TV broadcasts for automatic learning of British Sign Language
title_full	Exploiting signed TV broadcasts for automatic learning of British Sign Language
title_fullStr	Exploiting signed TV broadcasts for automatic learning of British Sign Language
title_full_unstemmed	Exploiting signed TV broadcasts for automatic learning of British Sign Language
title_short	Exploiting signed TV broadcasts for automatic learning of British Sign Language
title_sort	exploiting signed tv broadcasts for automatic learning of british sign language
work_keys_str_mv	AT buehlerp exploitingsignedtvbroadcastsforautomaticlearningofbritishsignlanguage AT everinghamm exploitingsignedtvbroadcastsforautomaticlearningofbritishsignlanguage AT zissermana exploitingsignedtvbroadcastsforautomaticlearningofbritishsignlanguage

Exploiting signed TV broadcasts for automatic learning of British Sign Language

Similar Items