Video Question-Answering Techniques, Benchmark Datasets and Evaluation Metrics Leveraging Video Captioning: A Comprehensive Survey

Video Question-Answering Techniques, Benchmark Datasets and Evaluation Metrics Leveraging Video Captioning: A Comprehensive Survey

While describing visual data is a trivial task for humans, it is an intricate task for a computer. This is even more challenging if the visual data is a video. Comprehending a video and describing it is called Video Captioning. This involves understanding the semantics of a video and then generating...

Full description

Bibliographic Details
Main Authors:	Khushboo Khurana, Umesh Deshpande
Format:	Article
Language:	English
Published:	IEEE 2021-01-01
Series:	IEEE Access
Subjects:	Video question answering video captioning video description generation natural language processing deep learning computer vision
Online Access:	https://ieeexplore.ieee.org/document/9350580/

Similar Items

TASTA: Text‐Assisted Spatial and Temporal Attention Network for Video Question Answering
by: Tian Wang, et al.
Published: (2023-04-01)

Real-time Arabic Video Captioning Using CNN and Transformer Networks Based on Parallel Implementation
by: Adel Jalal Yousif, et al.
Published: (2024-03-01)

DeepRide: Dashcam Video Description Dataset for Autonomous Vehicle Location-Aware Trip Description
by: Ghazala Rafiq, et al.
Published: (2022-01-01)

Bilingual video captioning model for enhanced video retrieval
by: Norah Alrebdi, et al.
Published: (2024-01-01)

Exploring deep learning approaches for video captioning: A comprehensive review
by: Adel Jalal Yousif, et al.
Published: (2023-12-01)

Deep learning and knowledge graph for image/video captioning: A review of datasets, evaluation metrics, and methods
by: Mohammad Saif Wajid, et al.
Published: (2024-01-01)

Video Description: Datasets & Evaluation Metrics
by: Muhammad Rafiq, et al.
Published: (2021-01-01)

Step by Step: A Gradual Approach for Dense Video Captioning
by: Wangyu Choi, et al.
Published: (2023-01-01)

A Video Question Answering Model Based on Knowledge Distillation
by: Zhuang Shao, et al.
Published: (2023-06-01)

Parallel Pathway Dense Video Captioning With Deformable Transformer
by: Wangyu Choi, et al.
Published: (2022-01-01)

CapERA: Captioning Events in Aerial Videos
by: Laila Bashmal, et al.
Published: (2023-04-01)

Parallel Dense Video Caption Generation with Multi-Modal Features
by: Xuefei Huang, et al.
Published: (2023-08-01)

Adaptive Curriculum Learning for Video Captioning
by: Shanhao Li, et al.
Published: (2022-01-01)

Fusion of Multi-Modal Features to Enhance Dense Video Caption
by: Xuefei Huang, et al.
Published: (2023-06-01)

Evaluation metrics for video captioning: A survey
by: Andrei de Souza Inácio, et al.
Published: (2023-09-01)

UAT: Universal Attention Transformer for Video Captioning
by: Heeju Im, et al.
Published: (2022-06-01)

Multi-Shared Attention with Global and Local Pathways for Video Question Answering
by: WANG Lei-quan, HOU Wen-yan, YUAN Shao-zu, ZHAO Xin, LIN Yao, WU Chun-lei
Published: (2021-08-01)

PWS-DVC: Enhancing Weakly Supervised Dense Video Captioning With Pretraining Approach
by: Wangyu Choi, et al.
Published: (2023-01-01)

Video Caption Based Searching Using End-to-End Dense Captioning and Sentence Embeddings
by: Akshay Aggarwal, et al.
Published: (2020-06-01)

Action knowledge for video captioning with graph neural networks
by: Willy Fitra Hendria, et al.
Published: (2023-04-01)

Temporally Multi-Modal Semantic Reasoning with Spatial Language Constraints for Video Question Answering
by: Mingyang Liu, et al.
Published: (2022-05-01)

Video Captions for Online Courses: Do YouTube’s Auto-generated Captions Meet Deaf Students’ Needs?
by: Becky Sue Parton
Published: (2016-08-01)

Video Captions for Online Courses: Do YouTube’s Auto-generated Captions Meet Deaf Students’ Needs?
by: Becky Sue Parton
Published: (2016-08-01)

MFVC: Urban Traffic Scene Video Caption Based on Multimodal Fusion
by: Mingxing Li, et al.
Published: (2022-09-01)

Quality Enhancement Based Video Captioning in Video Communication Systems
by: The Van Le, et al.
Published: (2024-01-01)

Teaching Medical English through Professional Captioning Videos
by: Džuganová Božena
Published: (2019-09-01)

Cross-modal graph with meta concepts for video captioning
by: Wang, Hao, et al.
Published: (2022)

Comparing the effectiveness of explicit EAL feedback through slideshow (text+audio) and captioned video
by: Jonathan Harrison
Published: (2022-04-01)

The why-what-when-who-how of using captioned videos as an instructional aid in EAL classrooms: Theoretical perspectives and classroom implications
by: Trinh Thai Van Phuc
Published: (2022-06-01)

A Fine-Grained Spatial-Temporal Attention Model for Video Captioning
by: An-An Liu, et al.
Published: (2018-01-01)

Video captioning with stacked attention and semantic hard pull
by: Md. Mushfiqur Rahman, et al.
Published: (2021-08-01)

Video Captioning Based on Channel Soft Attention and Semantic Reconstructor
by: Zhou Lei, et al.
Published: (2021-02-01)

Vision-Text Cross-Modal Fusion for Accurate Video Captioning
by: Kaouther Ouenniche, et al.
Published: (2023-01-01)

Att-BiL-SL: Attention-Based Bi-LSTM and Sequential LSTM for Describing Video in the Textual Formation
by: Shakil Ahmed, et al.
Published: (2021-12-01)

Automatic Image and Video Caption Generation With Deep Learning: A Concise Review and Algorithmic Overlap
by: Soheyla Amirian, et al.
Published: (2020-01-01)

A Multimodal Framework for Video Caption Generation
by: Reshmi S. Bhooshan, et al.
Published: (2022-01-01)

Semantic-filtered Soft-Split-Aware video captioning with audio-augmented feature
by: Xu, Yuecong, et al.
Published: (2021)

Video Captioning With Adaptive Attention and Mixed Loss Optimization
by: Huanhou Xiao, et al.
Published: (2019-01-01)

A Semantics-Assisted Video Captioning Model Trained With Scheduled Sampling
by: Haoran Chen, et al.
Published: (2020-09-01)

Multi-Task Video Captioning with a Stepwise Multimodal Encoder
by: Zihao Liu, et al.
Published: (2022-08-01)