COGNIMUSE: a multimodal video database annotated with saliency, events, semantics and emotion with application to summarization

Abstract Research related to computational modeling for machine-based understanding requires ground truth data for training, content analysis, and evaluation. In this paper, we present a multimodal video database, namely COGNIMUSE, annotated with sensory and semantic saliency, events, cross-media se...

Full description

Bibliographic Details
Main Authors: Athanasia Zlatintsi, Petros Koutras, Georgios Evangelopoulos, Nikolaos Malandrakis, Niki Efthymiou, Katerina Pastra, Alexandros Potamianos, Petros Maragos
Format: Article
Language:English
Published: SpringerOpen 2017-08-01
Series:EURASIP Journal on Image and Video Processing
Subjects:
Online Access:http://link.springer.com/article/10.1186/s13640-017-0194-1
_version_ 1811194222140719104
author Athanasia Zlatintsi
Petros Koutras
Georgios Evangelopoulos
Nikolaos Malandrakis
Niki Efthymiou
Katerina Pastra
Alexandros Potamianos
Petros Maragos
author_facet Athanasia Zlatintsi
Petros Koutras
Georgios Evangelopoulos
Nikolaos Malandrakis
Niki Efthymiou
Katerina Pastra
Alexandros Potamianos
Petros Maragos
author_sort Athanasia Zlatintsi
collection DOAJ
description Abstract Research related to computational modeling for machine-based understanding requires ground truth data for training, content analysis, and evaluation. In this paper, we present a multimodal video database, namely COGNIMUSE, annotated with sensory and semantic saliency, events, cross-media semantics, and emotion. The purpose of this database is manifold; it can be used for training and evaluation of event detection and summarization algorithms, for classification and recognition of audio-visual and cross-media events, as well as for emotion tracking. In order to enable comparisons with other computational models, we propose state-of-the-art algorithms, specifically a unified energy-based audio-visual framework and a method for text saliency computation, for the detection of perceptually salient events from videos. Additionally, a movie summarization system for the automatic production of summaries is presented. Two kinds of evaluation were performed, an objective based on the saliency annotation of the database and an extensive qualitative human evaluation of the automatically produced summaries, where we investigated what composes high-quality movie summaries, where both methods verified the appropriateness of the proposed methods. The annotation of the database and the code for the summarization system can be found at http://cognimuse.cs.ntua.gr/database .
first_indexed 2024-04-12T00:23:23Z
format Article
id doaj.art-8227720ba6b24450ae51c338d8c28908
institution Directory Open Access Journal
issn 1687-5281
language English
last_indexed 2024-04-12T00:23:23Z
publishDate 2017-08-01
publisher SpringerOpen
record_format Article
series EURASIP Journal on Image and Video Processing
spelling doaj.art-8227720ba6b24450ae51c338d8c289082022-12-22T03:55:39ZengSpringerOpenEURASIP Journal on Image and Video Processing1687-52812017-08-012017112410.1186/s13640-017-0194-1COGNIMUSE: a multimodal video database annotated with saliency, events, semantics and emotion with application to summarizationAthanasia Zlatintsi0Petros Koutras1Georgios Evangelopoulos2Nikolaos Malandrakis3Niki Efthymiou4Katerina Pastra5Alexandros Potamianos6Petros Maragos7School of Electr.& Comp. Enginr., National Technical University of AthensSchool of Electr.& Comp. Enginr., National Technical University of AthensMcGovern Institute for Brain Research at MIT MITSignal Analysis and Interpretation Laboratory (SAIL), USCSchool of Electr.& Comp. Enginr., National Technical University of AthensCognitive Systems Research InstituteSchool of Electr.& Comp. Enginr., National Technical University of AthensSchool of Electr.& Comp. Enginr., National Technical University of AthensAbstract Research related to computational modeling for machine-based understanding requires ground truth data for training, content analysis, and evaluation. In this paper, we present a multimodal video database, namely COGNIMUSE, annotated with sensory and semantic saliency, events, cross-media semantics, and emotion. The purpose of this database is manifold; it can be used for training and evaluation of event detection and summarization algorithms, for classification and recognition of audio-visual and cross-media events, as well as for emotion tracking. In order to enable comparisons with other computational models, we propose state-of-the-art algorithms, specifically a unified energy-based audio-visual framework and a method for text saliency computation, for the detection of perceptually salient events from videos. Additionally, a movie summarization system for the automatic production of summaries is presented. Two kinds of evaluation were performed, an objective based on the saliency annotation of the database and an extensive qualitative human evaluation of the automatically produced summaries, where we investigated what composes high-quality movie summaries, where both methods verified the appropriateness of the proposed methods. The annotation of the database and the code for the summarization system can be found at http://cognimuse.cs.ntua.gr/database .http://link.springer.com/article/10.1186/s13640-017-0194-1Video databaseSaliencyCross-media relationsEmotion annotationAudio-visual eventsVideo summarization
spellingShingle Athanasia Zlatintsi
Petros Koutras
Georgios Evangelopoulos
Nikolaos Malandrakis
Niki Efthymiou
Katerina Pastra
Alexandros Potamianos
Petros Maragos
COGNIMUSE: a multimodal video database annotated with saliency, events, semantics and emotion with application to summarization
EURASIP Journal on Image and Video Processing
Video database
Saliency
Cross-media relations
Emotion annotation
Audio-visual events
Video summarization
title COGNIMUSE: a multimodal video database annotated with saliency, events, semantics and emotion with application to summarization
title_full COGNIMUSE: a multimodal video database annotated with saliency, events, semantics and emotion with application to summarization
title_fullStr COGNIMUSE: a multimodal video database annotated with saliency, events, semantics and emotion with application to summarization
title_full_unstemmed COGNIMUSE: a multimodal video database annotated with saliency, events, semantics and emotion with application to summarization
title_short COGNIMUSE: a multimodal video database annotated with saliency, events, semantics and emotion with application to summarization
title_sort cognimuse a multimodal video database annotated with saliency events semantics and emotion with application to summarization
topic Video database
Saliency
Cross-media relations
Emotion annotation
Audio-visual events
Video summarization
url http://link.springer.com/article/10.1186/s13640-017-0194-1
work_keys_str_mv AT athanasiazlatintsi cognimuseamultimodalvideodatabaseannotatedwithsaliencyeventssemanticsandemotionwithapplicationtosummarization
AT petroskoutras cognimuseamultimodalvideodatabaseannotatedwithsaliencyeventssemanticsandemotionwithapplicationtosummarization
AT georgiosevangelopoulos cognimuseamultimodalvideodatabaseannotatedwithsaliencyeventssemanticsandemotionwithapplicationtosummarization
AT nikolaosmalandrakis cognimuseamultimodalvideodatabaseannotatedwithsaliencyeventssemanticsandemotionwithapplicationtosummarization
AT nikiefthymiou cognimuseamultimodalvideodatabaseannotatedwithsaliencyeventssemanticsandemotionwithapplicationtosummarization
AT katerinapastra cognimuseamultimodalvideodatabaseannotatedwithsaliencyeventssemanticsandemotionwithapplicationtosummarization
AT alexandrospotamianos cognimuseamultimodalvideodatabaseannotatedwithsaliencyeventssemanticsandemotionwithapplicationtosummarization
AT petrosmaragos cognimuseamultimodalvideodatabaseannotatedwithsaliencyeventssemanticsandemotionwithapplicationtosummarization