Dense video captioning based on local attention

Abstract Dense video captioning aims to locate multiple events in an untrimmed video and generate captions for each event. Previous methods experienced difficulties in establishing the multimodal feature relationship between frames and captions, resulting in low accuracy of the generated captions. T...

Full description

Bibliographic Details
Main Authors: Yong Qian, Yingchi Mao, Zhihao Chen, Chang Li, Olano Teah Bloh, Qian Huang
Format: Article
Language:English
Published: Wiley 2023-07-01
Series:IET Image Processing
Subjects:
Online Access:https://doi.org/10.1049/ipr2.12819

Similar Items