Parallel Dense Video Caption Generation with Multi-Modal Features
The task of dense video captioning is to generate detailed natural-language descriptions for an original video, which requires deep analysis and mining of semantic captions to identify events in the video. Existing methods typically follow a localisation-then-captioning sequence within given frame s...
Main Authors: | Xuefei Huang, Ka-Hou Chan, Wei Ke, Hao Sheng |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-08-01
|
Series: | Mathematics |
Subjects: | |
Online Access: | https://www.mdpi.com/2227-7390/11/17/3685 |
Similar Items
-
Fusion of Multi-Modal Features to Enhance Dense Video Caption
by: Xuefei Huang, et al.
Published: (2023-06-01) -
Towards Human-Interactive Controllable Video Captioning with Efficient Modeling
by: Yoonseok Heo, et al.
Published: (2024-06-01) -
PWS-DVC: Enhancing Weakly Supervised Dense Video Captioning With Pretraining Approach
by: Wangyu Choi, et al.
Published: (2023-01-01) -
Deep learning and knowledge graph for image/video captioning: A review of datasets, evaluation metrics, and methods
by: Mohammad Saif Wajid, et al.
Published: (2024-01-01) -
Real-time Arabic Video Captioning Using CNN and Transformer Networks Based on Parallel Implementation
by: Adel Jalal Yousif, et al.
Published: (2024-03-01)