A Multimodal Framework for Video Caption Generation

Video captioning is a highly challenging computer vision task that automatically describes the video clips using natural language sentences with a clear understanding of the embedded semantics. In this work, a video caption generation framework consisting of discrete wavelet convolutional neural arc...

Full description

Bibliographic Details
Main Authors:	Reshmi S. Bhooshan, Suresh K.
Format:	Article
Language:	English
Published:	IEEE 2022-01-01
Series:	IEEE Access
Subjects:	Video captioning discrete wavelet convolutional model multimodal feature extraction visual attention predictor
Online Access:	https://ieeexplore.ieee.org/document/9869626/

Internet

https://ieeexplore.ieee.org/document/9869626/

A Multimodal Framework for Video Caption Generation

Internet

Similar Items