Multi-Task Video Captioning with a Stepwise Multimodal Encoder
Video captioning aims to generate a grammatical and accurate sentence to describe a video. Recent methods have mainly tackled this problem by considering multiple modalities, yet they have neglected the difference in modalities and the importance of shrinking the gap between video and text. This pap...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-08-01
|
Series: | Electronics |
Subjects: | |
Online Access: | https://www.mdpi.com/2079-9292/11/17/2639 |