Exploiting signed TV broadcasts for automatic learning of British Sign Language

In this work, we will present several contributions towards automatic recognition of BSL signs from continuous signing video sequences. Specifically, we will address 3 main points: (i) automatic detection and tracking of the hands using a generative model of the image; (ii) automatic learning of sig...

Full description

Bibliographic Details
Main Authors: Buehler, P, Everingham, M, Zisserman, A
Format: Conference item
Language:English
Published: European Language Resources Association 2010
_version_ 1824459090509692928
author Buehler, P
Everingham, M
Zisserman, A
author_facet Buehler, P
Everingham, M
Zisserman, A
author_sort Buehler, P
collection OXFORD
description In this work, we will present several contributions towards automatic recognition of BSL signs from continuous signing video sequences. Specifically, we will address 3 main points: (i) automatic detection and tracking of the hands using a generative model of the image; (ii) automatic learning of signs from TV broadcasts of single signers, using only the supervisory information available from subtitles; and (iii) discriminative signer-independent sign recognition using automatically extracted training data from a single signer. <br> Our source material consists of many hours of video with continuous signing and corresponding subtitles recorded from BBC digital television. This is very challenging material for a number of reasons, including self-occlusions of the signer, self-shadowing, blur due to the speed of motion, and in particular the changing background. <br> Knowledge of the hand position and hand shape is a pre-requisite for automatic sign language recognition. We cast the problem of detecting and tracking the hands as inference in a generative model of the image, and propose a complete model which accounts for the positions and self-occlusions of the arms. Reasonable configurations are obtained by efficiently sampling from a pictorial structure proposal distribution. The results using our method exceed the state-of-the-art for the length and stability of continuous limb tracking. Previous research in sign language recognition has typically required manual training data to be generated for each sign, e.g. a signer performing each sign in controlled conditions - a time-consuming and expensive procedure. We show that for a given signer, a large number of BSL signs can be learned automatically from TV broadcasts using the supervisory information available from subtitles broadcast simultaneously with the signing. We achieve this by modelling the problem as one of multiple instance learning. In this way we are able to extract the sign of interest from hours of signing footage, despite the very weak and "noisy" supervision from the subtitles. <br> Lastly, we will show how the automatic recognition of signs can be extended to multiple signers. Using automatically extracted examples from a single signer we train discriminative classifiers and show that these can successfully recognize signs for unseen signers. This demonstrates that our features (hand trajectory and hand shape) generalise well across different signers, despite the significant inter-personal differences in signing.
first_indexed 2025-02-19T04:36:15Z
format Conference item
id oxford-uuid:63c55663-4308-41d5-9247-c26e496d7f73
institution University of Oxford
language English
last_indexed 2025-02-19T04:36:15Z
publishDate 2010
publisher European Language Resources Association
record_format dspace
spelling oxford-uuid:63c55663-4308-41d5-9247-c26e496d7f732025-01-30T11:16:08ZExploiting signed TV broadcasts for automatic learning of British Sign LanguageConference itemhttp://purl.org/coar/resource_type/c_5794uuid:63c55663-4308-41d5-9247-c26e496d7f73EnglishSymplectic ElementsEuropean Language Resources Association2010Buehler, PEveringham, MZisserman, AIn this work, we will present several contributions towards automatic recognition of BSL signs from continuous signing video sequences. Specifically, we will address 3 main points: (i) automatic detection and tracking of the hands using a generative model of the image; (ii) automatic learning of signs from TV broadcasts of single signers, using only the supervisory information available from subtitles; and (iii) discriminative signer-independent sign recognition using automatically extracted training data from a single signer. <br> Our source material consists of many hours of video with continuous signing and corresponding subtitles recorded from BBC digital television. This is very challenging material for a number of reasons, including self-occlusions of the signer, self-shadowing, blur due to the speed of motion, and in particular the changing background. <br> Knowledge of the hand position and hand shape is a pre-requisite for automatic sign language recognition. We cast the problem of detecting and tracking the hands as inference in a generative model of the image, and propose a complete model which accounts for the positions and self-occlusions of the arms. Reasonable configurations are obtained by efficiently sampling from a pictorial structure proposal distribution. The results using our method exceed the state-of-the-art for the length and stability of continuous limb tracking. Previous research in sign language recognition has typically required manual training data to be generated for each sign, e.g. a signer performing each sign in controlled conditions - a time-consuming and expensive procedure. We show that for a given signer, a large number of BSL signs can be learned automatically from TV broadcasts using the supervisory information available from subtitles broadcast simultaneously with the signing. We achieve this by modelling the problem as one of multiple instance learning. In this way we are able to extract the sign of interest from hours of signing footage, despite the very weak and "noisy" supervision from the subtitles. <br> Lastly, we will show how the automatic recognition of signs can be extended to multiple signers. Using automatically extracted examples from a single signer we train discriminative classifiers and show that these can successfully recognize signs for unseen signers. This demonstrates that our features (hand trajectory and hand shape) generalise well across different signers, despite the significant inter-personal differences in signing.
spellingShingle Buehler, P
Everingham, M
Zisserman, A
Exploiting signed TV broadcasts for automatic learning of British Sign Language
title Exploiting signed TV broadcasts for automatic learning of British Sign Language
title_full Exploiting signed TV broadcasts for automatic learning of British Sign Language
title_fullStr Exploiting signed TV broadcasts for automatic learning of British Sign Language
title_full_unstemmed Exploiting signed TV broadcasts for automatic learning of British Sign Language
title_short Exploiting signed TV broadcasts for automatic learning of British Sign Language
title_sort exploiting signed tv broadcasts for automatic learning of british sign language
work_keys_str_mv AT buehlerp exploitingsignedtvbroadcastsforautomaticlearningofbritishsignlanguage
AT everinghamm exploitingsignedtvbroadcastsforautomaticlearningofbritishsignlanguage
AT zissermana exploitingsignedtvbroadcastsforautomaticlearningofbritishsignlanguage