COGNIMUSE: a multimodal video database annotated with saliency, events, semantics and emotion with application to summarization
Abstract Research related to computational modeling for machine-based understanding requires ground truth data for training, content analysis, and evaluation. In this paper, we present a multimodal video database, namely COGNIMUSE, annotated with sensory and semantic saliency, events, cross-media se...
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
SpringerOpen
2017-08-01
|
Series: | EURASIP Journal on Image and Video Processing |
Subjects: | |
Online Access: | http://link.springer.com/article/10.1186/s13640-017-0194-1 |
_version_ | 1811194222140719104 |
---|---|
author | Athanasia Zlatintsi Petros Koutras Georgios Evangelopoulos Nikolaos Malandrakis Niki Efthymiou Katerina Pastra Alexandros Potamianos Petros Maragos |
author_facet | Athanasia Zlatintsi Petros Koutras Georgios Evangelopoulos Nikolaos Malandrakis Niki Efthymiou Katerina Pastra Alexandros Potamianos Petros Maragos |
author_sort | Athanasia Zlatintsi |
collection | DOAJ |
description | Abstract Research related to computational modeling for machine-based understanding requires ground truth data for training, content analysis, and evaluation. In this paper, we present a multimodal video database, namely COGNIMUSE, annotated with sensory and semantic saliency, events, cross-media semantics, and emotion. The purpose of this database is manifold; it can be used for training and evaluation of event detection and summarization algorithms, for classification and recognition of audio-visual and cross-media events, as well as for emotion tracking. In order to enable comparisons with other computational models, we propose state-of-the-art algorithms, specifically a unified energy-based audio-visual framework and a method for text saliency computation, for the detection of perceptually salient events from videos. Additionally, a movie summarization system for the automatic production of summaries is presented. Two kinds of evaluation were performed, an objective based on the saliency annotation of the database and an extensive qualitative human evaluation of the automatically produced summaries, where we investigated what composes high-quality movie summaries, where both methods verified the appropriateness of the proposed methods. The annotation of the database and the code for the summarization system can be found at http://cognimuse.cs.ntua.gr/database . |
first_indexed | 2024-04-12T00:23:23Z |
format | Article |
id | doaj.art-8227720ba6b24450ae51c338d8c28908 |
institution | Directory Open Access Journal |
issn | 1687-5281 |
language | English |
last_indexed | 2024-04-12T00:23:23Z |
publishDate | 2017-08-01 |
publisher | SpringerOpen |
record_format | Article |
series | EURASIP Journal on Image and Video Processing |
spelling | doaj.art-8227720ba6b24450ae51c338d8c289082022-12-22T03:55:39ZengSpringerOpenEURASIP Journal on Image and Video Processing1687-52812017-08-012017112410.1186/s13640-017-0194-1COGNIMUSE: a multimodal video database annotated with saliency, events, semantics and emotion with application to summarizationAthanasia Zlatintsi0Petros Koutras1Georgios Evangelopoulos2Nikolaos Malandrakis3Niki Efthymiou4Katerina Pastra5Alexandros Potamianos6Petros Maragos7School of Electr.& Comp. Enginr., National Technical University of AthensSchool of Electr.& Comp. Enginr., National Technical University of AthensMcGovern Institute for Brain Research at MIT MITSignal Analysis and Interpretation Laboratory (SAIL), USCSchool of Electr.& Comp. Enginr., National Technical University of AthensCognitive Systems Research InstituteSchool of Electr.& Comp. Enginr., National Technical University of AthensSchool of Electr.& Comp. Enginr., National Technical University of AthensAbstract Research related to computational modeling for machine-based understanding requires ground truth data for training, content analysis, and evaluation. In this paper, we present a multimodal video database, namely COGNIMUSE, annotated with sensory and semantic saliency, events, cross-media semantics, and emotion. The purpose of this database is manifold; it can be used for training and evaluation of event detection and summarization algorithms, for classification and recognition of audio-visual and cross-media events, as well as for emotion tracking. In order to enable comparisons with other computational models, we propose state-of-the-art algorithms, specifically a unified energy-based audio-visual framework and a method for text saliency computation, for the detection of perceptually salient events from videos. Additionally, a movie summarization system for the automatic production of summaries is presented. Two kinds of evaluation were performed, an objective based on the saliency annotation of the database and an extensive qualitative human evaluation of the automatically produced summaries, where we investigated what composes high-quality movie summaries, where both methods verified the appropriateness of the proposed methods. The annotation of the database and the code for the summarization system can be found at http://cognimuse.cs.ntua.gr/database .http://link.springer.com/article/10.1186/s13640-017-0194-1Video databaseSaliencyCross-media relationsEmotion annotationAudio-visual eventsVideo summarization |
spellingShingle | Athanasia Zlatintsi Petros Koutras Georgios Evangelopoulos Nikolaos Malandrakis Niki Efthymiou Katerina Pastra Alexandros Potamianos Petros Maragos COGNIMUSE: a multimodal video database annotated with saliency, events, semantics and emotion with application to summarization EURASIP Journal on Image and Video Processing Video database Saliency Cross-media relations Emotion annotation Audio-visual events Video summarization |
title | COGNIMUSE: a multimodal video database annotated with saliency, events, semantics and emotion with application to summarization |
title_full | COGNIMUSE: a multimodal video database annotated with saliency, events, semantics and emotion with application to summarization |
title_fullStr | COGNIMUSE: a multimodal video database annotated with saliency, events, semantics and emotion with application to summarization |
title_full_unstemmed | COGNIMUSE: a multimodal video database annotated with saliency, events, semantics and emotion with application to summarization |
title_short | COGNIMUSE: a multimodal video database annotated with saliency, events, semantics and emotion with application to summarization |
title_sort | cognimuse a multimodal video database annotated with saliency events semantics and emotion with application to summarization |
topic | Video database Saliency Cross-media relations Emotion annotation Audio-visual events Video summarization |
url | http://link.springer.com/article/10.1186/s13640-017-0194-1 |
work_keys_str_mv | AT athanasiazlatintsi cognimuseamultimodalvideodatabaseannotatedwithsaliencyeventssemanticsandemotionwithapplicationtosummarization AT petroskoutras cognimuseamultimodalvideodatabaseannotatedwithsaliencyeventssemanticsandemotionwithapplicationtosummarization AT georgiosevangelopoulos cognimuseamultimodalvideodatabaseannotatedwithsaliencyeventssemanticsandemotionwithapplicationtosummarization AT nikolaosmalandrakis cognimuseamultimodalvideodatabaseannotatedwithsaliencyeventssemanticsandemotionwithapplicationtosummarization AT nikiefthymiou cognimuseamultimodalvideodatabaseannotatedwithsaliencyeventssemanticsandemotionwithapplicationtosummarization AT katerinapastra cognimuseamultimodalvideodatabaseannotatedwithsaliencyeventssemanticsandemotionwithapplicationtosummarization AT alexandrospotamianos cognimuseamultimodalvideodatabaseannotatedwithsaliencyeventssemanticsandemotionwithapplicationtosummarization AT petrosmaragos cognimuseamultimodalvideodatabaseannotatedwithsaliencyeventssemanticsandemotionwithapplicationtosummarization |