Dense video captioning based on local attention

Dense video captioning based on local attention

Abstract Dense video captioning aims to locate multiple events in an untrimmed video and generate captions for each event. Previous methods experienced difficulties in establishing the multimodal feature relationship between frames and captions, resulting in low accuracy of the generated captions. T...

Full description

Bibliographic Details
Main Authors:	Yong Qian, Yingchi Mao, Zhihao Chen, Chang Li, Olano Teah Bloh, Qian Huang
Format:	Article
Language:	English
Published:	Wiley 2023-07-01
Series:	IET Image Processing
Subjects:	2D temporal differential CNN dense video captioning event proposal feature extraction local attention
Online Access:	https://doi.org/10.1049/ipr2.12819

Similar Items

Step by Step: A Gradual Approach for Dense Video Captioning
by: Wangyu Choi, et al.
Published: (2023-01-01)

Parallel Dense Video Caption Generation with Multi-Modal Features
by: Xuefei Huang, et al.
Published: (2023-08-01)

Fusion of Multi-Modal Features to Enhance Dense Video Caption
by: Xuefei Huang, et al.
Published: (2023-06-01)

PWS-DVC: Enhancing Weakly Supervised Dense Video Captioning With Pretraining Approach
by: Wangyu Choi, et al.
Published: (2023-01-01)

Lightweight dense video captioning with cross-modal attention and knowledge-enhanced unbiased scene graph
by: Shixing Han, et al.
Published: (2023-02-01)

Deep learning and knowledge graph for image/video captioning: A review of datasets, evaluation metrics, and methods
by: Mohammad Saif Wajid, et al.
Published: (2024-01-01)

Parallel Pathway Dense Video Captioning With Deformable Transformer
by: Wangyu Choi, et al.
Published: (2022-01-01)

UAT: Universal Attention Transformer for Video Captioning
by: Heeju Im, et al.
Published: (2022-06-01)

Video Caption Based Searching Using End-to-End Dense Captioning and Sentence Embeddings
by: Akshay Aggarwal, et al.
Published: (2020-06-01)

Video Captioning Based on Channel Soft Attention and Semantic Reconstructor
by: Zhou Lei, et al.
Published: (2021-02-01)

Dense captioning and multidimensional evaluations for indoor robotic scenes
by: Hua Wang, et al.
Published: (2023-11-01)

A Multimodal Framework for Video Caption Generation
by: Reshmi S. Bhooshan, et al.
Published: (2022-01-01)

MFVC: Urban Traffic Scene Video Caption Based on Multimodal Fusion
by: Mingxing Li, et al.
Published: (2022-09-01)

Veg-DenseCap: Dense Captioning Model for Vegetable Leaf Disease Images
by: Wei Sun, et al.
Published: (2023-06-01)

Bilingual video captioning model for enhanced video retrieval
by: Norah Alrebdi, et al.
Published: (2024-01-01)

Semantic Representations With Attention Networks for Boosting Image Captioning
by: Deema Abdal Hafeth, et al.
Published: (2023-01-01)

Cross-scale Feature Fusion Self-attention for Image Captioning
by: WANG Ming-zhan, JI Jun-zhong, JIA Ao-zhe, ZHANG Xiao-dan
Published: (2022-10-01)

An attention-based hybrid deep learning approach for bengali video captioning
by: Md. Shahir Zaoad, et al.
Published: (2023-01-01)

CapERA: Captioning Events in Aerial Videos
by: Laila Bashmal, et al.
Published: (2023-04-01)

Real-time Arabic Video Captioning Using CNN and Transformer Networks Based on Parallel Implementation
by: Adel Jalal Yousif, et al.
Published: (2024-03-01)

Folk Games Image Captioning using Object Attention
by: Saiful Akbar, et al.
Published: (2023-08-01)

An Image Captioning Algorithm Based on Combination Attention Mechanism
by: Jinlong Liu, et al.
Published: (2022-04-01)

Action knowledge for video captioning with graph neural networks
by: Willy Fitra Hendria, et al.
Published: (2023-04-01)

Video Captions for Online Courses: Do YouTube’s Auto-generated Captions Meet Deaf Students’ Needs?
by: Becky Sue Parton
Published: (2016-08-01)

Video Captions for Online Courses: Do YouTube’s Auto-generated Captions Meet Deaf Students’ Needs?
by: Becky Sue Parton
Published: (2016-08-01)

Social Image Captioning: Exploring Visual Attention and User Attention
by: Leiquan Wang, et al.
Published: (2018-02-01)

Cross-modal graph with meta concepts for video captioning
by: Wang, Hao, et al.
Published: (2022)

Full-Memory Transformer for Image Captioning
by: Tongwei Lu, et al.
Published: (2023-01-01)

Quality Enhancement Based Video Captioning in Video Communication Systems
by: The Van Le, et al.
Published: (2024-01-01)

Evaluation metrics for video captioning: A survey
by: Andrei de Souza Inácio, et al.
Published: (2023-09-01)

Automatic Defect Description of Railway Track Line Image Based on Dense Captioning
by: Dehua Wei, et al.
Published: (2022-08-01)

Exploring deep learning approaches for video captioning: A comprehensive review
by: Adel Jalal Yousif, et al.
Published: (2023-12-01)

Semantic-filtered Soft-Split-Aware video captioning with audio-augmented feature
by: Xu, Yuecong, et al.
Published: (2021)

Sequential Dual Attention: Coarse-to-Fine-Grained Hierarchical Generation for Image Captioning
by: Zhibin Guan, et al.
Published: (2018-11-01)

A Context Semantic Auxiliary Network for Image Captioning
by: Jianying Li, et al.
Published: (2023-07-01)

VSAM-Based Visual Keyword Generation for Image Caption
by: Suya Zhang, et al.
Published: (2021-01-01)

Comparing the effectiveness of explicit EAL feedback through slideshow (text+audio) and captioned video
by: Jonathan Harrison
Published: (2022-04-01)

Vision-Text Cross-Modal Fusion for Accurate Video Captioning
by: Kaouther Ouenniche, et al.
Published: (2023-01-01)

Fashion-Oriented Image Captioning with External Knowledge Retrieval and Fully Attentive Gates
by: Nicholas Moratelli, et al.
Published: (2023-01-01)

An Attentive Fourier-Augmented Image-Captioning Transformer
by: Raymond Ian Osolo, et al.
Published: (2021-09-01)