Video Question-Answering Techniques, Benchmark Datasets and Evaluation Metrics Leveraging Video Captioning: A Comprehensive Survey
While describing visual data is a trivial task for humans, it is an intricate task for a computer. This is even more challenging if the visual data is a video. Comprehending a video and describing it is called Video Captioning. This involves understanding the semantics of a video and then generating...
Main Authors: | Khushboo Khurana, Umesh Deshpande |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2021-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9350580/ |
Similar Items
-
TASTA: Text‐Assisted Spatial and Temporal Attention Network for Video Question Answering
by: Tian Wang, et al.
Published: (2023-04-01) -
Real-time Arabic Video Captioning Using CNN and Transformer Networks Based on Parallel Implementation
by: Adel Jalal Yousif, et al.
Published: (2024-03-01) -
DeepRide: Dashcam Video Description Dataset for Autonomous Vehicle Location-Aware Trip Description
by: Ghazala Rafiq, et al.
Published: (2022-01-01) -
Bilingual video captioning model for enhanced video retrieval
by: Norah Alrebdi, et al.
Published: (2024-01-01) -
Exploring deep learning approaches for video captioning: A comprehensive review
by: Adel Jalal Yousif, et al.
Published: (2023-12-01)