Learning structured video descriptions: Automated video knowledge extraction for video understanding tasks

Vision to language problems, such as video annotation, or visual question answering, stand out from the perceptual video understanding tasks (e.g., classification) through their cognitive nature and their tight connection to the field of natural language processing. While most of the current solutio...

সম্পূর্ণ বিবরণ

গ্রন্থ-পঞ্জীর বিবরন
প্রধান লেখক:	Vasile, D, Lukasiewicz, T
বিন্যাস:	Conference item
প্রকাশিত:	Springer 2018

_version_	1826303038751506432
author	Vasile, D Lukasiewicz, T
author_facet	Vasile, D Lukasiewicz, T
author_sort	Vasile, D
collection	OXFORD
description	Vision to language problems, such as video annotation, or visual question answering, stand out from the perceptual video understanding tasks (e.g., classification) through their cognitive nature and their tight connection to the field of natural language processing. While most of the current solutions to vision-to-language problems are inspired from machine translation methods, aiming to directly map visual features to text, several recent results on image and video understanding have proven the importance of specifically and formally representing the semantic content of a visual scene, before reasoning over it and mapping it to natural language. This paper proposes a deep learning solution to the problem of generating structured descriptions for videos, and evaluates it on a dataset of formally annotated videos, which has been automatically generated as part of this work. The recorded results confirm the potential of the solution, indicating that it manages to describe the semantic content in a video scene with a similar accuracy to the one of state-of-the-art natural language captioning models.
first_indexed	2024-03-07T05:56:33Z
format	Conference item
id	oxford-uuid:eab50226-444a-4620-83a1-0e43185a81ae
institution	University of Oxford
last_indexed	2024-03-07T05:56:33Z
publishDate	2018
publisher	Springer
record_format	dspace
spelling	oxford-uuid:eab50226-444a-4620-83a1-0e43185a81ae2022-03-27T11:04:14ZLearning structured video descriptions: Automated video knowledge extraction for video understanding tasksConference itemhttp://purl.org/coar/resource_type/c_5794uuid:eab50226-444a-4620-83a1-0e43185a81aeSymplectic Elements at OxfordSpringer2018Vasile, DLukasiewicz, TVision to language problems, such as video annotation, or visual question answering, stand out from the perceptual video understanding tasks (e.g., classification) through their cognitive nature and their tight connection to the field of natural language processing. While most of the current solutions to vision-to-language problems are inspired from machine translation methods, aiming to directly map visual features to text, several recent results on image and video understanding have proven the importance of specifically and formally representing the semantic content of a visual scene, before reasoning over it and mapping it to natural language. This paper proposes a deep learning solution to the problem of generating structured descriptions for videos, and evaluates it on a dataset of formally annotated videos, which has been automatically generated as part of this work. The recorded results confirm the potential of the solution, indicating that it manages to describe the semantic content in a video scene with a similar accuracy to the one of state-of-the-art natural language captioning models.
spellingShingle	Vasile, D Lukasiewicz, T Learning structured video descriptions: Automated video knowledge extraction for video understanding tasks
title	Learning structured video descriptions: Automated video knowledge extraction for video understanding tasks
title_full	Learning structured video descriptions: Automated video knowledge extraction for video understanding tasks
title_fullStr	Learning structured video descriptions: Automated video knowledge extraction for video understanding tasks
title_full_unstemmed	Learning structured video descriptions: Automated video knowledge extraction for video understanding tasks
title_short	Learning structured video descriptions: Automated video knowledge extraction for video understanding tasks
title_sort	learning structured video descriptions automated video knowledge extraction for video understanding tasks
work_keys_str_mv	AT vasiled learningstructuredvideodescriptionsautomatedvideoknowledgeextractionforvideounderstandingtasks AT lukasiewiczt learningstructuredvideodescriptionsautomatedvideoknowledgeextractionforvideounderstandingtasks

Learning structured video descriptions: Automated video knowledge extraction for video understanding tasks

অনুরূপ উপাদানগুলি