Automatic and efficient human pose estimation for sign language videos

We present a fully automatic arm and hand tracker that detects joint positions over continuous sign language video sequences of more than an hour in length. To achieve this, we make contributions in four areas: (i) we show that the overlaid signer can be separated from the background TV broadcast us...

Descrición completa

Detalles Bibliográficos
Main Authors:	Charles, J, Pfister, T, Everingham, M, Zisserman, A
Formato:	Journal article
Idioma:	English
Publicado:	Springer 2013

_version_	1826314725682577408
author	Charles, J Pfister, T Everingham, M Zisserman, A
author_facet	Charles, J Pfister, T Everingham, M Zisserman, A
author_sort	Charles, J
collection	OXFORD
description	We present a fully automatic arm and hand tracker that detects joint positions over continuous sign language video sequences of more than an hour in length. To achieve this, we make contributions in four areas: (i) we show that the overlaid signer can be separated from the background TV broadcast using co-segmentation over all frames with a layered model; (ii) we show that joint positions (shoulders, elbows, wrists) can be predicted per-frame using a random forest regressor given only this segmentation and a colour model; (iii) we show that the random forest can be trained from an existing semi-automatic, but computationally expensive, tracker; and, (iv) introduce an evaluator to assess whether the predicted joint positions are correct for each frame. The method is applied to 20 signing footage videos with changing background, challenging imaging conditions, and for different signers. Our framework outperforms the state-of-the-art long term tracker by Buehler et al. (International Journal of Computer Vision 95:180–197, 2011), does not require the manual annotation of that work, and, after automatic initialisation, performs tracking in real-time. We also achieve superior joint localisation results to those obtained using the pose estimation method of Yang and Ramanan (Proceedings of the IEEE conference on computer vision and pattern recognition, 2011).
first_indexed	2024-09-25T04:35:15Z
format	Journal article
id	oxford-uuid:8e3944bc-cf2c-4e94-86fe-b57e250b940e
institution	University of Oxford
language	English
last_indexed	2024-12-09T03:11:59Z
publishDate	2013
publisher	Springer
record_format	dspace
spelling	oxford-uuid:8e3944bc-cf2c-4e94-86fe-b57e250b940e2024-10-11T10:18:22ZAutomatic and efficient human pose estimation for sign language videosJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:8e3944bc-cf2c-4e94-86fe-b57e250b940eEnglishSymplectic ElementsSpringer2013Charles, JPfister, TEveringham, MZisserman, AWe present a fully automatic arm and hand tracker that detects joint positions over continuous sign language video sequences of more than an hour in length. To achieve this, we make contributions in four areas: (i) we show that the overlaid signer can be separated from the background TV broadcast using co-segmentation over all frames with a layered model; (ii) we show that joint positions (shoulders, elbows, wrists) can be predicted per-frame using a random forest regressor given only this segmentation and a colour model; (iii) we show that the random forest can be trained from an existing semi-automatic, but computationally expensive, tracker; and, (iv) introduce an evaluator to assess whether the predicted joint positions are correct for each frame. The method is applied to 20 signing footage videos with changing background, challenging imaging conditions, and for different signers. Our framework outperforms the state-of-the-art long term tracker by Buehler et al. (International Journal of Computer Vision 95:180–197, 2011), does not require the manual annotation of that work, and, after automatic initialisation, performs tracking in real-time. We also achieve superior joint localisation results to those obtained using the pose estimation method of Yang and Ramanan (Proceedings of the IEEE conference on computer vision and pattern recognition, 2011).
spellingShingle	Charles, J Pfister, T Everingham, M Zisserman, A Automatic and efficient human pose estimation for sign language videos
title	Automatic and efficient human pose estimation for sign language videos
title_full	Automatic and efficient human pose estimation for sign language videos
title_fullStr	Automatic and efficient human pose estimation for sign language videos
title_full_unstemmed	Automatic and efficient human pose estimation for sign language videos
title_short	Automatic and efficient human pose estimation for sign language videos
title_sort	automatic and efficient human pose estimation for sign language videos
work_keys_str_mv	AT charlesj automaticandefficienthumanposeestimationforsignlanguagevideos AT pfistert automaticandefficienthumanposeestimationforsignlanguagevideos AT everinghamm automaticandefficienthumanposeestimationforsignlanguagevideos AT zissermana automaticandefficienthumanposeestimationforsignlanguagevideos

Automatic and efficient human pose estimation for sign language videos

Títulos similares