Parallel Dense Video Caption Generation with Multi-Modal Features

Parallel Dense Video Caption Generation with Multi-Modal Features

The task of dense video captioning is to generate detailed natural-language descriptions for an original video, which requires deep analysis and mining of semantic captions to identify events in the video. Existing methods typically follow a localisation-then-captioning sequence within given frame s...

Full description

Bibliographic Details
Main Authors:	Xuefei Huang, Ka-Hou Chan, Wei Ke, Hao Sheng
Format:	Article
Language:	English
Published:	MDPI AG 2023-08-01
Series:	Mathematics
Subjects:	dense video caption video captioning multimodal feature fusion feature extraction neural network
Online Access:	https://www.mdpi.com/2227-7390/11/17/3685

Similar Items

Fusion of Multi-Modal Features to Enhance Dense Video Caption
by: Xuefei Huang, et al.
Published: (2023-06-01)

Towards Human-Interactive Controllable Video Captioning with Efficient Modeling
by: Yoonseok Heo, et al.
Published: (2024-06-01)

PWS-DVC: Enhancing Weakly Supervised Dense Video Captioning With Pretraining Approach
by: Wangyu Choi, et al.
Published: (2023-01-01)

Deep learning and knowledge graph for image/video captioning: A review of datasets, evaluation metrics, and methods
by: Mohammad Saif Wajid, et al.
Published: (2024-01-01)

Real-time Arabic Video Captioning Using CNN and Transformer Networks Based on Parallel Implementation
by: Adel Jalal Yousif, et al.
Published: (2024-03-01)

Bilingual video captioning model for enhanced video retrieval
by: Norah Alrebdi, et al.
Published: (2024-01-01)

UAT: Universal Attention Transformer for Video Captioning
by: Heeju Im, et al.
Published: (2022-06-01)

Cross-modal graph with meta concepts for video captioning
by: Wang, Hao, et al.
Published: (2022)

CapERA: Captioning Events in Aerial Videos
by: Laila Bashmal, et al.
Published: (2023-04-01)

DanceCaps: Pseudo-Captioning for Dance Videos Using Large Language Models
by: Seohyun Kim, et al.
Published: (2024-11-01)

A Systematic Literature Review on Using the Encoder-Decoder Models for Image Captioning in English and Arabic Languages
by: Ashwaq Alsayed, et al.
Published: (2023-09-01)

Fine-Grained Length Controllable Video Captioning With Ordinal Embeddings
by: Tomoya Nitta, et al.
Published: (2024-01-01)

Caption-Guided Interpretable Video Anomaly Detection Based on Memory Similarity
by: Yuzhi Shi, et al.
Published: (2024-01-01)

Exploring deep learning approaches for video captioning: A comprehensive review
by: Adel Jalal Yousif, et al.
Published: (2023-12-01)

Explicit Image Caption Reasoning: Generating Accurate and Informative Captions for Complex Scenes with LMM
by: Mingzhang Cui, et al.
Published: (2024-06-01)

Exploring better image captioning with grid features
by: Jie Yan, et al.
Published: (2024-02-01)

Image-Caption Model Based on Fusion Feature
by: Yaogang Geng, et al.
Published: (2022-09-01)

Quality Enhancement Based Video Captioning in Video Communication Systems
by: The Van Le, et al.
Published: (2024-01-01)

Generalized Image Captioning for Multilingual Support
by: Suhyun Cho, et al.
Published: (2023-02-01)

Comparing the effectiveness of explicit EAL feedback through slideshow (text+audio) and captioned video
by: Jonathan Harrison
Published: (2022-04-01)

Semantic-filtered Soft-Split-Aware video captioning with audio-augmented feature
by: Xu, Yuecong, et al.
Published: (2021)

The Optimal Choice of the Encoder–Decoder Model Components for Image Captioning
by: Mateusz Bartosiewicz, et al.
Published: (2024-08-01)

Video Captioning Based on Channel Soft Attention and Semantic Reconstructor
by: Zhou Lei, et al.
Published: (2021-02-01)

MIRA-CAP: Memory-Integrated Retrieval-Augmented Captioning for State-of-the-Art Image and Video Captioning
by: Sabina Umirzakova, et al.
Published: (2024-12-01)

Cross-Modal Transformer-Based Streaming Dense Video Captioning with Neural ODE Temporal Localization
by: Shakhnoza Muksimova, et al.
Published: (2025-01-01)

Tiny TR-CAP: A novel small-scale benchmark dataset for general-purpose image captioning tasks
by: Abbas Memiş, et al.
Published: (2025-04-01)

Style-Enhanced Transformer for Image Captioning in Construction Scenes
by: Kani Song, et al.
Published: (2024-03-01)

Novel Object Captioning with Semantic Match from External Knowledge
by: Sen Du, et al.
Published: (2023-07-01)

Fashion-Oriented Image Captioning with External Knowledge Retrieval and Fully Attentive Gates
by: Nicholas Moratelli, et al.
Published: (2023-01-01)

Fit for What Purpose? NER Certification of Automatic Captions in English and Spanish
by: Pablo Romero-Fresco, et al.
Published: (2025-01-01)

Incidental vocabulary recognition effects of subtitled, captioned and reverse subtitled audiovisual input
by: Jana van der Kolk, et al.
Published: (2024-07-01)

Semi-Supervised Image Captioning by Adversarially Propagating Labeled Data
by: Dong-Jin Kim, et al.
Published: (2024-01-01)

Automatic Defect Description of Railway Track Line Image Based on Dense Captioning
by: Dehua Wei, et al.
Published: (2022-08-01)

Text Augmentation Using BERT for Image Captioning
by: Viktar Atliha, et al.
Published: (2020-08-01)

#PraCegoVer: A Large Dataset for Image Captioning in Portuguese
by: Gabriel Oliveira dos Santos, et al.
Published: (2022-01-01)

A Unified Visual and Linguistic Semantics Method for Enhanced Image Captioning
by: Jiajia Peng, et al.
Published: (2024-03-01)

An effective video captioning based on language description using a novel Graylag Deep Kookaburra Reinforcement Learning
by: M. Gowri Shankar, et al.
Published: (2025-01-01)

Closed Captions en español. Live Captioning Quality in Spanish-language Newscasts in the U.S.
by: Nazaret Fresno
Published: (2024-07-01)

CLIP-Based Grid Features and Masking for Remote Sensing Image Captioning
by: Qiaoling Lin, et al.
Published: (2025-01-01)

Bridging human and machine intelligence: Reverse-engineering radiologist intentions for clinical trust and adoption
by: Akash Awasthi, et al.
Published: (2024-12-01)