Dense video captioning based on local attention
Abstract Dense video captioning aims to locate multiple events in an untrimmed video and generate captions for each event. Previous methods experienced difficulties in establishing the multimodal feature relationship between frames and captions, resulting in low accuracy of the generated captions. T...
Main Authors: | Yong Qian, Yingchi Mao, Zhihao Chen, Chang Li, Olano Teah Bloh, Qian Huang |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2023-07-01
|
Series: | IET Image Processing |
Subjects: | |
Online Access: | https://doi.org/10.1049/ipr2.12819 |
Similar Items
-
Step by Step: A Gradual Approach for Dense Video Captioning
by: Wangyu Choi, et al.
Published: (2023-01-01) -
Parallel Dense Video Caption Generation with Multi-Modal Features
by: Xuefei Huang, et al.
Published: (2023-08-01) -
Fusion of Multi-Modal Features to Enhance Dense Video Caption
by: Xuefei Huang, et al.
Published: (2023-06-01) -
PWS-DVC: Enhancing Weakly Supervised Dense Video Captioning With Pretraining Approach
by: Wangyu Choi, et al.
Published: (2023-01-01) -
Lightweight dense video captioning with cross-modal attention and knowledge-enhanced unbiased scene graph
by: Shixing Han, et al.
Published: (2023-02-01)