Human focused action localization in video

We propose a novel human-centric approach to detect and localize human actions in challenging video data, such as Hollywood movies. Our goal is to localize actions in time through the video and spatially in each frame. We achiev...

Full description

Bibliographic Details
Main Authors:	Kläser, A, Marszałek, M, Schmid, C, Zisserman, A
Format:	Conference item
Language:	English
Published:	Springer 2012

_version_	1826313639650394112
author	Kläser, A Marszałek, M Schmid, C Zisserman, A
author_facet	Kläser, A Marszałek, M Schmid, C Zisserman, A
author_sort	Kläser, A
collection	OXFORD
description	<p>We propose a novel <em>human-centric</em> approach to <em>detect and localize</em> human actions in <em>challenging</em> video data, such as Hollywood movies. Our goal is to localize actions in time through the video and spatially in each frame. We achieve this by first obtaining generic spatio-temporal human tracks and then detecting specific actions within these using a sliding window classifier.</p> <br> <p>We make the following contributions: (i) We show that splitting the action localization task into spatial and temporal search leads to an efficient localization algorithm where generic human tracks can be reused to recognize multiple human actions; (ii) We develop a human detector and tracker which is able to cope with a wide range of postures, articulations, motions and camera viewpoints. The tracker includes detection interpolation and a principled classification stage to suppress false positive tracks; (iii) We propose a track-aligned 3D-HOG action representation, investigate its parameters, and show that action localization benefits from using tracks; and (iv) We introduce a new action localization dataset based on Hollywood movies.</p> <br> <p>Results are presented on a number of <em>real-world</em> movies with crowded, dynamic environment, partial occlusion and cluttered background. On the Coffee&Cigarettes dataset we significantly improve over the state of the art. Furthermore, we obtain excellent results on the new <em>Hollywood–Localization</em> dataset.</p>
first_indexed	2024-09-25T04:18:08Z
format	Conference item
id	oxford-uuid:f36a8a10-95ff-43ea-a6cb-79a5c9325658
institution	University of Oxford
language	English
last_indexed	2024-09-25T04:18:08Z
publishDate	2012
publisher	Springer
record_format	dspace
spelling	oxford-uuid:f36a8a10-95ff-43ea-a6cb-79a5c93256582024-07-23T17:26:30ZHuman focused action localization in videoConference itemhttp://purl.org/coar/resource_type/c_5794uuid:f36a8a10-95ff-43ea-a6cb-79a5c9325658EnglishSymplectic ElementsSpringer2012Kläser, AMarszałek, MSchmid, CZisserman, A<p>We propose a novel <em>human-centric</em> approach to <em>detect and localize</em> human actions in <em>challenging</em> video data, such as Hollywood movies. Our goal is to localize actions in time through the video and spatially in each frame. We achieve this by first obtaining generic spatio-temporal human tracks and then detecting specific actions within these using a sliding window classifier.</p> <br> <p>We make the following contributions: (i) We show that splitting the action localization task into spatial and temporal search leads to an efficient localization algorithm where generic human tracks can be reused to recognize multiple human actions; (ii) We develop a human detector and tracker which is able to cope with a wide range of postures, articulations, motions and camera viewpoints. The tracker includes detection interpolation and a principled classification stage to suppress false positive tracks; (iii) We propose a track-aligned 3D-HOG action representation, investigate its parameters, and show that action localization benefits from using tracks; and (iv) We introduce a new action localization dataset based on Hollywood movies.</p> <br> <p>Results are presented on a number of <em>real-world</em> movies with crowded, dynamic environment, partial occlusion and cluttered background. On the Coffee&Cigarettes dataset we significantly improve over the state of the art. Furthermore, we obtain excellent results on the new <em>Hollywood–Localization</em> dataset.</p>
spellingShingle	Kläser, A Marszałek, M Schmid, C Zisserman, A Human focused action localization in video
title	Human focused action localization in video
title_full	Human focused action localization in video
title_fullStr	Human focused action localization in video
title_full_unstemmed	Human focused action localization in video
title_short	Human focused action localization in video
title_sort	human focused action localization in video
work_keys_str_mv	AT klasera humanfocusedactionlocalizationinvideo AT marszałekm humanfocusedactionlocalizationinvideo AT schmidc humanfocusedactionlocalizationinvideo AT zissermana humanfocusedactionlocalizationinvideo

Human focused action localization in video

Similar Items