Video Question-Answering Techniques, Benchmark Datasets and Evaluation Metrics Leveraging Video Captioning: A Comprehensive Survey

While describing visual data is a trivial task for humans, it is an intricate task for a computer. This is even more challenging if the visual data is a video. Comprehending a video and describing it is called Video Captioning. This involves understanding the semantics of a video and then generating...

Full description

Bibliographic Details
Main Authors:	Khushboo Khurana, Umesh Deshpande
Format:	Article
Language:	English
Published:	IEEE 2021-01-01
Series:	IEEE Access
Subjects:	Video question answering video captioning video description generation natural language processing deep learning computer vision
Online Access:	https://ieeexplore.ieee.org/document/9350580/

_version_	1819276872194719744
author	Khushboo Khurana Umesh Deshpande
author_facet	Khushboo Khurana Umesh Deshpande
author_sort	Khushboo Khurana
collection	DOAJ
description	While describing visual data is a trivial task for humans, it is an intricate task for a computer. This is even more challenging if the visual data is a video. Comprehending a video and describing it is called Video Captioning. This involves understanding the semantics of a video and then generating human-like descriptions of the video. It requires the collaboration of both research communities of computer vision and natural language processing. The captions generated by video captioning can be further utilized for video retrieval, summarization, question-answering, etc. Video Question-Answering (video-QA) involves querying the system to obtain an answer in response. This paper presents a brief survey of the video captioning techniques and a comprehensive review of existing techniques, datasets, and evaluation metrics for the task of video-QA. Video-QA techniques rely on the attention mechanism to generate relevant results. The presented survey shows that recent works on Memory Networks, Generative Adversarial Networks, and Reinforced Decoders, have the capability to handle the complexities and challenges of video-QA. Additionally, the graph-based methods, although less explored, give very promising results. In this article, we have discussed the emerging research directions and various application areas of video-QA.
first_indexed	2024-12-23T23:47:07Z
format	Article
id	doaj.art-4a198f41a0c94edda47fe75e8c4940ec
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-12-23T23:47:07Z
publishDate	2021-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-4a198f41a0c94edda47fe75e8c4940ec2022-12-21T17:25:29ZengIEEEIEEE Access2169-35362021-01-019437994382310.1109/ACCESS.2021.30582489350580Video Question-Answering Techniques, Benchmark Datasets and Evaluation Metrics Leveraging Video Captioning: A Comprehensive SurveyKhushboo Khurana0https://orcid.org/0000-0002-4751-1778Umesh Deshpande1Department of Computer Science and Engineering, Shri Ramdeobaba College of Engineering and Management, Nagpur, IndiaDepartment of Computer Science and Engineering, Visvesvaraya National Institute of Technology (VNIT), Nagpur, IndiaWhile describing visual data is a trivial task for humans, it is an intricate task for a computer. This is even more challenging if the visual data is a video. Comprehending a video and describing it is called Video Captioning. This involves understanding the semantics of a video and then generating human-like descriptions of the video. It requires the collaboration of both research communities of computer vision and natural language processing. The captions generated by video captioning can be further utilized for video retrieval, summarization, question-answering, etc. Video Question-Answering (video-QA) involves querying the system to obtain an answer in response. This paper presents a brief survey of the video captioning techniques and a comprehensive review of existing techniques, datasets, and evaluation metrics for the task of video-QA. Video-QA techniques rely on the attention mechanism to generate relevant results. The presented survey shows that recent works on Memory Networks, Generative Adversarial Networks, and Reinforced Decoders, have the capability to handle the complexities and challenges of video-QA. Additionally, the graph-based methods, although less explored, give very promising results. In this article, we have discussed the emerging research directions and various application areas of video-QA.https://ieeexplore.ieee.org/document/9350580/Video question answeringvideo captioningvideo description generationnatural language processingdeep learningcomputer vision
spellingShingle	Khushboo Khurana Umesh Deshpande Video Question-Answering Techniques, Benchmark Datasets and Evaluation Metrics Leveraging Video Captioning: A Comprehensive Survey IEEE Access Video question answering video captioning video description generation natural language processing deep learning computer vision
title	Video Question-Answering Techniques, Benchmark Datasets and Evaluation Metrics Leveraging Video Captioning: A Comprehensive Survey
title_full	Video Question-Answering Techniques, Benchmark Datasets and Evaluation Metrics Leveraging Video Captioning: A Comprehensive Survey
title_fullStr	Video Question-Answering Techniques, Benchmark Datasets and Evaluation Metrics Leveraging Video Captioning: A Comprehensive Survey
title_full_unstemmed	Video Question-Answering Techniques, Benchmark Datasets and Evaluation Metrics Leveraging Video Captioning: A Comprehensive Survey
title_short	Video Question-Answering Techniques, Benchmark Datasets and Evaluation Metrics Leveraging Video Captioning: A Comprehensive Survey
title_sort	video question answering techniques benchmark datasets and evaluation metrics leveraging video captioning a comprehensive survey
topic	Video question answering video captioning video description generation natural language processing deep learning computer vision
url	https://ieeexplore.ieee.org/document/9350580/
work_keys_str_mv	AT khushbookhurana videoquestionansweringtechniquesbenchmarkdatasetsandevaluationmetricsleveragingvideocaptioningacomprehensivesurvey AT umeshdeshpande videoquestionansweringtechniquesbenchmarkdatasetsandevaluationmetricsleveragingvideocaptioningacomprehensivesurvey

Video Question-Answering Techniques, Benchmark Datasets and Evaluation Metrics Leveraging Video Captioning: A Comprehensive Survey

Similar Items