Human focused action localization in video

<p>We propose a novel <em>human-centric</em> approach to <em>detect and localize</em> human actions in <em>challenging</em> video data, such as Hollywood movies. Our goal is to localize actions in time through the video and spatially in each frame. We achiev...

Full description

Bibliographic Details
Main Authors: Kläser, A, Marszałek, M, Schmid, C, Zisserman, A
Format: Conference item
Language:English
Published: Springer 2012
_version_ 1826313639650394112
author Kläser, A
Marszałek, M
Schmid, C
Zisserman, A
author_facet Kläser, A
Marszałek, M
Schmid, C
Zisserman, A
author_sort Kläser, A
collection OXFORD
description <p>We propose a novel <em>human-centric</em> approach to <em>detect and localize</em> human actions in <em>challenging</em> video data, such as Hollywood movies. Our goal is to localize actions in time through the video and spatially in each frame. We achieve this by first obtaining generic spatio-temporal human tracks and then detecting specific actions within these using a sliding window classifier.</p> <br> <p>We make the following contributions: (i) We show that splitting the action localization task into spatial and temporal search leads to an efficient localization algorithm where generic human tracks can be reused to recognize multiple human actions; (ii) We develop a human detector and tracker which is able to cope with a wide range of postures, articulations, motions and camera viewpoints. The tracker includes detection interpolation and a principled classification stage to suppress false positive tracks; (iii) We propose a track-aligned 3D-HOG action representation, investigate its parameters, and show that action localization benefits from using tracks; and (iv) We introduce a new action localization dataset based on Hollywood movies.</p> <br> <p>Results are presented on a number of <em>real-world</em> movies with crowded, dynamic environment, partial occlusion and cluttered background. On the Coffee&Cigarettes dataset we significantly improve over the state of the art. Furthermore, we obtain excellent results on the new <em>Hollywood–Localization</em> dataset.</p>
first_indexed 2024-09-25T04:18:08Z
format Conference item
id oxford-uuid:f36a8a10-95ff-43ea-a6cb-79a5c9325658
institution University of Oxford
language English
last_indexed 2024-09-25T04:18:08Z
publishDate 2012
publisher Springer
record_format dspace
spelling oxford-uuid:f36a8a10-95ff-43ea-a6cb-79a5c93256582024-07-23T17:26:30ZHuman focused action localization in videoConference itemhttp://purl.org/coar/resource_type/c_5794uuid:f36a8a10-95ff-43ea-a6cb-79a5c9325658EnglishSymplectic ElementsSpringer2012Kläser, AMarszałek, MSchmid, CZisserman, A<p>We propose a novel <em>human-centric</em> approach to <em>detect and localize</em> human actions in <em>challenging</em> video data, such as Hollywood movies. Our goal is to localize actions in time through the video and spatially in each frame. We achieve this by first obtaining generic spatio-temporal human tracks and then detecting specific actions within these using a sliding window classifier.</p> <br> <p>We make the following contributions: (i) We show that splitting the action localization task into spatial and temporal search leads to an efficient localization algorithm where generic human tracks can be reused to recognize multiple human actions; (ii) We develop a human detector and tracker which is able to cope with a wide range of postures, articulations, motions and camera viewpoints. The tracker includes detection interpolation and a principled classification stage to suppress false positive tracks; (iii) We propose a track-aligned 3D-HOG action representation, investigate its parameters, and show that action localization benefits from using tracks; and (iv) We introduce a new action localization dataset based on Hollywood movies.</p> <br> <p>Results are presented on a number of <em>real-world</em> movies with crowded, dynamic environment, partial occlusion and cluttered background. On the Coffee&Cigarettes dataset we significantly improve over the state of the art. Furthermore, we obtain excellent results on the new <em>Hollywood–Localization</em> dataset.</p>
spellingShingle Kläser, A
Marszałek, M
Schmid, C
Zisserman, A
Human focused action localization in video
title Human focused action localization in video
title_full Human focused action localization in video
title_fullStr Human focused action localization in video
title_full_unstemmed Human focused action localization in video
title_short Human focused action localization in video
title_sort human focused action localization in video
work_keys_str_mv AT klasera humanfocusedactionlocalizationinvideo
AT marszałekm humanfocusedactionlocalizationinvideo
AT schmidc humanfocusedactionlocalizationinvideo
AT zissermana humanfocusedactionlocalizationinvideo