Fusion of Multi-Modal Features to Enhance Dense Video Caption

Dense video caption is a task that aims to help computers analyze the content of a video by generating abstract captions for a sequence of video frames. However, most of the existing methods only use visual features in the video and ignore the audio features that are also essential for understanding...

Full description

Bibliographic Details
Main Authors:	Xuefei Huang, Ka-Hou Chan, Weifan Wu, Hao Sheng, Wei Ke
Format:	Article
Language:	English
Published:	MDPI AG 2023-06-01
Series:	Sensors
Subjects:	dense video caption video captioning multi-modal feature fusion feature extraction neural network
Online Access:	https://www.mdpi.com/1424-8220/23/12/5565

Internet

https://www.mdpi.com/1424-8220/23/12/5565

Fusion of Multi-Modal Features to Enhance Dense Video Caption

Internet

Similar Items