Automatic and efficient human pose estimation for sign language videos

We present a fully automatic arm and hand tracker that detects joint positions over continuous sign language video sequences of more than an hour in length. To achieve this, we make contributions in four areas: (i) we show that the overlaid signer can be separated from the background TV broadcast us...

Full description

Bibliographic Details
Main Authors: Charles, J, Pfister, T, Everingham, M, Zisserman, A
Format: Journal article
Language:English
Published: Springer 2013
_version_ 1826314725682577408
author Charles, J
Pfister, T
Everingham, M
Zisserman, A
author_facet Charles, J
Pfister, T
Everingham, M
Zisserman, A
author_sort Charles, J
collection OXFORD
description We present a fully automatic arm and hand tracker that detects joint positions over continuous sign language video sequences of more than an hour in length. To achieve this, we make contributions in four areas: (i) we show that the overlaid signer can be separated from the background TV broadcast using co-segmentation over all frames with a layered model; (ii) we show that joint positions (shoulders, elbows, wrists) can be predicted per-frame using a random forest regressor given only this segmentation and a colour model; (iii) we show that the random forest can be trained from an existing semi-automatic, but computationally expensive, tracker; and, (iv) introduce an evaluator to assess whether the predicted joint positions are correct for each frame. The method is applied to 20 signing footage videos with changing background, challenging imaging conditions, and for different signers. Our framework outperforms the state-of-the-art long term tracker by Buehler et al. (International Journal of Computer Vision 95:180–197, 2011), does not require the manual annotation of that work, and, after automatic initialisation, performs tracking in real-time. We also achieve superior joint localisation results to those obtained using the pose estimation method of Yang and Ramanan (Proceedings of the IEEE conference on computer vision and pattern recognition, 2011).
first_indexed 2024-09-25T04:35:15Z
format Journal article
id oxford-uuid:8e3944bc-cf2c-4e94-86fe-b57e250b940e
institution University of Oxford
language English
last_indexed 2024-12-09T03:11:59Z
publishDate 2013
publisher Springer
record_format dspace
spelling oxford-uuid:8e3944bc-cf2c-4e94-86fe-b57e250b940e2024-10-11T10:18:22ZAutomatic and efficient human pose estimation for sign language videosJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:8e3944bc-cf2c-4e94-86fe-b57e250b940eEnglishSymplectic ElementsSpringer2013Charles, JPfister, TEveringham, MZisserman, AWe present a fully automatic arm and hand tracker that detects joint positions over continuous sign language video sequences of more than an hour in length. To achieve this, we make contributions in four areas: (i) we show that the overlaid signer can be separated from the background TV broadcast using co-segmentation over all frames with a layered model; (ii) we show that joint positions (shoulders, elbows, wrists) can be predicted per-frame using a random forest regressor given only this segmentation and a colour model; (iii) we show that the random forest can be trained from an existing semi-automatic, but computationally expensive, tracker; and, (iv) introduce an evaluator to assess whether the predicted joint positions are correct for each frame. The method is applied to 20 signing footage videos with changing background, challenging imaging conditions, and for different signers. Our framework outperforms the state-of-the-art long term tracker by Buehler et al. (International Journal of Computer Vision 95:180–197, 2011), does not require the manual annotation of that work, and, after automatic initialisation, performs tracking in real-time. We also achieve superior joint localisation results to those obtained using the pose estimation method of Yang and Ramanan (Proceedings of the IEEE conference on computer vision and pattern recognition, 2011).
spellingShingle Charles, J
Pfister, T
Everingham, M
Zisserman, A
Automatic and efficient human pose estimation for sign language videos
title Automatic and efficient human pose estimation for sign language videos
title_full Automatic and efficient human pose estimation for sign language videos
title_fullStr Automatic and efficient human pose estimation for sign language videos
title_full_unstemmed Automatic and efficient human pose estimation for sign language videos
title_short Automatic and efficient human pose estimation for sign language videos
title_sort automatic and efficient human pose estimation for sign language videos
work_keys_str_mv AT charlesj automaticandefficienthumanposeestimationforsignlanguagevideos
AT pfistert automaticandefficienthumanposeestimationforsignlanguagevideos
AT everinghamm automaticandefficienthumanposeestimationforsignlanguagevideos
AT zissermana automaticandefficienthumanposeestimationforsignlanguagevideos