Semantic-filtered Soft-Split-Aware video captioning with audio-augmented feature
Automatic video description, or video captioning, is a challenging yet much attractive task. It aims to combine video with text. Multiple methods have been proposed based on neural networks, utilizing Convolutional Neural Networks (CNN) to extract features, and Recurrent Neural Networks (RNN) to enc...
Main Authors: | Xu, Yuecong, Yang, Jianfei, Mao, Kezhi |
---|---|
Other Authors: | School of Electrical and Electronic Engineering |
Format: | Journal Article |
Language: | English |
Published: |
2021
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/151341 |
Similar Items
-
Step by Step: A Gradual Approach for Dense Video Captioning
by: Wangyu Choi, et al.
Published: (2023-01-01) -
Cross-modal graph with meta concepts for video captioning
by: Wang, Hao, et al.
Published: (2022) -
Video Captioning Based on Channel Soft Attention and Semantic Reconstructor
by: Zhou Lei, et al.
Published: (2021-02-01) -
Towards Human-Interactive Controllable Video Captioning with Efficient Modeling
by: Yoonseok Heo, et al.
Published: (2024-06-01) -
Bilingual video captioning model for enhanced video retrieval
by: Norah Alrebdi, et al.
Published: (2024-01-01)