Multi-Task Video Captioning with a Stepwise Multimodal Encoder

Video captioning aims to generate a grammatical and accurate sentence to describe a video. Recent methods have mainly tackled this problem by considering multiple modalities, yet they have neglected the difference in modalities and the importance of shrinking the gap between video and text. This pap...

Full description

Bibliographic Details
Main Authors: Zihao Liu, Xiaoyu Wu, Ying Yu
Format: Article
Language:English
Published: MDPI AG 2022-08-01
Series:Electronics
Subjects:
Online Access:https://www.mdpi.com/2079-9292/11/17/2639