Human focused action localization in video
<p>We propose a novel <em>human-centric</em> approach to <em>detect and localize</em> human actions in <em>challenging</em> video data, such as Hollywood movies. Our goal is to localize actions in time through the video and spatially in each frame. We achiev...
Main Authors: | , , , |
---|---|
Format: | Conference item |
Language: | English |
Published: |
Springer
2012
|
_version_ | 1826313639650394112 |
---|---|
author | Kläser, A Marszałek, M Schmid, C Zisserman, A |
author_facet | Kläser, A Marszałek, M Schmid, C Zisserman, A |
author_sort | Kläser, A |
collection | OXFORD |
description | <p>We propose a novel <em>human-centric</em> approach to <em>detect and localize</em> human actions in <em>challenging</em> video data, such as Hollywood movies. Our goal is to localize actions in time through the video and spatially in each frame. We achieve this by first obtaining generic spatio-temporal human tracks and then detecting specific actions within these using a sliding window classifier.</p>
<br>
<p>We make the following contributions: (i) We show that splitting the action localization task into spatial and temporal search leads to an efficient localization algorithm where generic human tracks can be reused to recognize multiple human actions; (ii) We develop a human detector and tracker which is able to cope with a wide range of postures, articulations, motions and camera viewpoints. The tracker includes detection interpolation and a principled classification stage to suppress false positive tracks; (iii) We propose a track-aligned 3D-HOG action representation, investigate its parameters, and show that action localization benefits from using tracks; and (iv) We introduce a new action localization dataset based on Hollywood movies.</p>
<br>
<p>Results are presented on a number of <em>real-world</em> movies with crowded, dynamic environment, partial occlusion and cluttered background. On the Coffee&Cigarettes dataset we significantly improve over the state of the art. Furthermore, we obtain excellent results on the new <em>Hollywood–Localization</em> dataset.</p> |
first_indexed | 2024-09-25T04:18:08Z |
format | Conference item |
id | oxford-uuid:f36a8a10-95ff-43ea-a6cb-79a5c9325658 |
institution | University of Oxford |
language | English |
last_indexed | 2024-09-25T04:18:08Z |
publishDate | 2012 |
publisher | Springer |
record_format | dspace |
spelling | oxford-uuid:f36a8a10-95ff-43ea-a6cb-79a5c93256582024-07-23T17:26:30ZHuman focused action localization in videoConference itemhttp://purl.org/coar/resource_type/c_5794uuid:f36a8a10-95ff-43ea-a6cb-79a5c9325658EnglishSymplectic ElementsSpringer2012Kläser, AMarszałek, MSchmid, CZisserman, A<p>We propose a novel <em>human-centric</em> approach to <em>detect and localize</em> human actions in <em>challenging</em> video data, such as Hollywood movies. Our goal is to localize actions in time through the video and spatially in each frame. We achieve this by first obtaining generic spatio-temporal human tracks and then detecting specific actions within these using a sliding window classifier.</p> <br> <p>We make the following contributions: (i) We show that splitting the action localization task into spatial and temporal search leads to an efficient localization algorithm where generic human tracks can be reused to recognize multiple human actions; (ii) We develop a human detector and tracker which is able to cope with a wide range of postures, articulations, motions and camera viewpoints. The tracker includes detection interpolation and a principled classification stage to suppress false positive tracks; (iii) We propose a track-aligned 3D-HOG action representation, investigate its parameters, and show that action localization benefits from using tracks; and (iv) We introduce a new action localization dataset based on Hollywood movies.</p> <br> <p>Results are presented on a number of <em>real-world</em> movies with crowded, dynamic environment, partial occlusion and cluttered background. On the Coffee&Cigarettes dataset we significantly improve over the state of the art. Furthermore, we obtain excellent results on the new <em>Hollywood–Localization</em> dataset.</p> |
spellingShingle | Kläser, A Marszałek, M Schmid, C Zisserman, A Human focused action localization in video |
title | Human focused action localization in video |
title_full | Human focused action localization in video |
title_fullStr | Human focused action localization in video |
title_full_unstemmed | Human focused action localization in video |
title_short | Human focused action localization in video |
title_sort | human focused action localization in video |
work_keys_str_mv | AT klasera humanfocusedactionlocalizationinvideo AT marszałekm humanfocusedactionlocalizationinvideo AT schmidc humanfocusedactionlocalizationinvideo AT zissermana humanfocusedactionlocalizationinvideo |