Content modelling for human action detection via multidimensional approach

Video content analysis is an active research domain due to the availability and the increment of audiovisual data in the digital format. There is a need to automatically extracting video content for efficient access, understanding,browsing and retrieval of videos. To obtain the information that is o...

Full description

Bibliographic Details
Main Authors:	Abdullah, Lili Nurliyana, Khalid, Fatimah
Format:	Article
Language:	English English
Published:	Computer Science Journals 2009
Subjects:	Information storage and retrieval systems - Digital video. Optical storage devices.
Online Access:	http://psasir.upm.edu.my/id/eprint/13775/1/Content%20modelling%20for%20human%20action%20detection%20via%20multidimensional%20approach.pdf

_version_	1825945300768915456
author	Abdullah, Lili Nurliyana Khalid, Fatimah
author_facet	Abdullah, Lili Nurliyana Khalid, Fatimah
author_sort	Abdullah, Lili Nurliyana
collection	UPM
description	Video content analysis is an active research domain due to the availability and the increment of audiovisual data in the digital format. There is a need to automatically extracting video content for efficient access, understanding,browsing and retrieval of videos. To obtain the information that is of interest and to provide better entertainment, tools are needed to help users extract relevant content and to effectively navigate through the large amount of available video information. Existing methods do not seem to attempt to model and estimate the semantic content of the video. Detecting and interpreting human presence,actions and activities is one of the most valuable functions in this proposed framework. The general objectives of this research are to analyze and process the audio-video streams to a robust audiovisual action recognition system by integrating, structuring and accessing multimodal information via multidimensional retrieval and extraction model. The proposed technique characterizes the action scenes by integrating cues obtained from both the audio and video tracks. Information is combined based on visual features (motion,edge, and visual characteristics of objects), audio features and video for recognizing action. This model uses HMM and GMM to provide a framework for fusing these features and to represent the multidimensional structure of the framework. The action-related visual cues are obtained by computing the spatio temporal dynamic activity from the video shots and by abstracting specific visual events. Simultaneously, the audio features are analyzed by locating and compute several sound effects of action events that embedded in the video. Finally, these audio and visual cues are combined to identify the action scenes. Compared with using single source of either visual or audio track alone, such combined audio visual information provides more reliable performance and allows us to understand the story content of movies in more detail. To compare the usefulness of the proposed framework, several experiments were conducted and the results were obtained by using visual features only (77.89% for precision;72.10% for recall), audio features only (62.52% for precision; 48.93% for recall)and combined audiovisual (90.35% for precision; 90.65% for recall).
first_indexed	2024-03-06T07:29:21Z
format	Article
id	upm.eprints-13775
institution	Universiti Putra Malaysia
language	English English
last_indexed	2024-03-06T07:29:21Z
publishDate	2009
publisher	Computer Science Journals
record_format	dspace
spelling	upm.eprints-137752015-11-20T08:20:10Z http://psasir.upm.edu.my/id/eprint/13775/ Content modelling for human action detection via multidimensional approach Abdullah, Lili Nurliyana Khalid, Fatimah Video content analysis is an active research domain due to the availability and the increment of audiovisual data in the digital format. There is a need to automatically extracting video content for efficient access, understanding,browsing and retrieval of videos. To obtain the information that is of interest and to provide better entertainment, tools are needed to help users extract relevant content and to effectively navigate through the large amount of available video information. Existing methods do not seem to attempt to model and estimate the semantic content of the video. Detecting and interpreting human presence,actions and activities is one of the most valuable functions in this proposed framework. The general objectives of this research are to analyze and process the audio-video streams to a robust audiovisual action recognition system by integrating, structuring and accessing multimodal information via multidimensional retrieval and extraction model. The proposed technique characterizes the action scenes by integrating cues obtained from both the audio and video tracks. Information is combined based on visual features (motion,edge, and visual characteristics of objects), audio features and video for recognizing action. This model uses HMM and GMM to provide a framework for fusing these features and to represent the multidimensional structure of the framework. The action-related visual cues are obtained by computing the spatio temporal dynamic activity from the video shots and by abstracting specific visual events. Simultaneously, the audio features are analyzed by locating and compute several sound effects of action events that embedded in the video. Finally, these audio and visual cues are combined to identify the action scenes. Compared with using single source of either visual or audio track alone, such combined audio visual information provides more reliable performance and allows us to understand the story content of movies in more detail. To compare the usefulness of the proposed framework, several experiments were conducted and the results were obtained by using visual features only (77.89% for precision;72.10% for recall), audio features only (62.52% for precision; 48.93% for recall)and combined audiovisual (90.35% for precision; 90.65% for recall). Computer Science Journals 2009 Article PeerReviewed application/pdf en http://psasir.upm.edu.my/id/eprint/13775/1/Content%20modelling%20for%20human%20action%20detection%20via%20multidimensional%20approach.pdf Abdullah, Lili Nurliyana and Khalid, Fatimah (2009) Content modelling for human action detection via multidimensional approach. International Journal of Image Processing, 3 (1). pp. 17-30. ISSN 1985-2304 Information storage and retrieval systems - Digital video. Optical storage devices. English
spellingShingle	Information storage and retrieval systems - Digital video. Optical storage devices. Abdullah, Lili Nurliyana Khalid, Fatimah Content modelling for human action detection via multidimensional approach
title	Content modelling for human action detection via multidimensional approach
title_full	Content modelling for human action detection via multidimensional approach
title_fullStr	Content modelling for human action detection via multidimensional approach
title_full_unstemmed	Content modelling for human action detection via multidimensional approach
title_short	Content modelling for human action detection via multidimensional approach
title_sort	content modelling for human action detection via multidimensional approach
topic	Information storage and retrieval systems - Digital video. Optical storage devices.
url	http://psasir.upm.edu.my/id/eprint/13775/1/Content%20modelling%20for%20human%20action%20detection%20via%20multidimensional%20approach.pdf
work_keys_str_mv	AT abdullahlilinurliyana contentmodellingforhumanactiondetectionviamultidimensionalapproach AT khalidfatimah contentmodellingforhumanactiondetectionviamultidimensionalapproach

Content modelling for human action detection via multidimensional approach

Similar Items