Enhancing performance in video grounding tasks through the use of captions

Enhancing performance in video grounding tasks through the use of captions

This report explores enhancing video grounding tasks by utilizing generated captions, addressing the challenge posed by sparse annotations in video datasets. We took inspiration from the PCNet model which uses caption-guided attention to fuse the captions generated by Parallel Dynamic Video Captioni...

Full description

Bibliographic Details
Main Author:	Liu, Xinran
Other Authors:	Sun Aixin
Format:	Final Year Project (FYP)
Language:	English
Published:	Nanyang Technological University 2024
Subjects:	Computer and Information Science Temporal sentence grounding Machine learning
Online Access:	https://hdl.handle.net/10356/175356

Similar Items

Enhancing performance in video grounding tasks through the use of attention module
by: Do Duc Anh
Published: (2024)

Neural image and video captioning
by: Lam, Ting En
Published: (2024)

Grounded semantic parsing using captioned videos
by: Ross, Candace Cheronda
Published: (2018)

Caption-Guided Interpretable Video Anomaly Detection Based on Memory Similarity
by: Yuzhi Shi, et al.
Published: (2024-01-01)

Neural tracking of phrases in spoken language comprehension is automatic and task-dependent
by: Sanne ten Oever, et al.
Published: (2022-07-01)

FIGURE AND GROUND IN PASSIVE SENTENCES
by: Nguyễn Tất Thắng
Published: (2013-12-01)

A Semantics-Assisted Video Captioning Model Trained With Scheduled Sampling
by: Haoran Chen, et al.
Published: (2020-09-01)

Neural image and video captioning (NIVC)
by: Lee, Jeremy Kian Kiat
Published: (2022)

Multi-Perspective Attention Network for Fast Temporal Moment Localization
by: Jungkyoo Shin, et al.
Published: (2021-01-01)

Coarse-to-Fine Spatial-Temporal Relationship Inference for Temporal Sentence Grounding
by: Shanshan Qi, et al.
Published: (2021-01-01)

Parallel Pathway Dense Video Captioning With Deformable Transformer
by: Wangyu Choi, et al.
Published: (2022-01-01)

Content moderation assistance through image caption generation
by: Liam Kearns
Published: (2025-03-01)

Automatic closed caption generation from video files
by: Tan, Kenneth Chengwei
Published: (2014)

Grounding referring expression in computer vision
by: Yuen, Shaun Chien Wee
Published: (2024)

Sentence part-enhanced BERT with respect to downstream tasks
by: Chaoming Liu, et al.
Published: (2022-07-01)

Bilingual video captioning model for enhanced video retrieval
by: Norah Alrebdi, et al.
Published: (2024-01-01)

A Fine-Grained Spatial-Temporal Attention Model for Video Captioning
by: An-An Liu, et al.
Published: (2018-01-01)

Dense video captioning based on local attention
by: Yong Qian, et al.
Published: (2023-07-01)

The Discursive Variants of the Sentence Constituted as an Effect of its Contextual Updating: Typology Sketch
by: Ion BĂRBUȚĂ
Published: (2022-08-01)

Reinforcement Learning with Multi-Policy Movement Strategy for Weakly Supervised Temporal Sentence Grounding
by: Shan Jiang, et al.
Published: (2024-10-01)

Cross-modal graph with meta concepts for video captioning
by: Wang, Hao, et al.
Published: (2022)

Aligning vision and language for image captioning using deep learning
by: Cai, Chen
Published: (2024)

Generalization capacity of natural language video localization (NLVL) models
by: Dhanyamraju, Harsh Rao
Published: (2024)

Alpha power during task performance predicts individual language comprehension
by: P. Wang, et al.
Published: (2022-10-01)

Few-shot learning for text classification
by: Cao, Jianzhe
Published: (2025)

Temporal consistent video editing using diffusion models
by: Bai, Shun Yao
Published: (2024)

Captioned Video: Making It Work for You
by: Randall S. Davis
Published: (2016-01-01)

Estimation of Near-Ground Ozone With High Spatio-Temporal Resolution in the Yangtze River Delta Region of China Based on a Temporally Ensemble Model
by: Zhen Li, et al.
Published: (2023-01-01)

Computing grounded theory: a quantitative method to develop theories
by: Zhuo Chen, et al.
Published: (2024-06-01)

Understanding video through the lens of language
by: Bain, M
Published: (2023)

Aggregating intrinsic information to enhance BCI performance through federated learning
by: Liu, Rui, et al.
Published: (2024)

Еxpression of conditionality in compound sentences of the Udmurt language (semantic and structural features)
by: Nadezhda N. Timerkhanova
Published: (2022-07-01)

Chinese image captioning with fusion encoder and visual keyword search
by: Yang Zou, et al.
Published: (2024-09-01)

Towards abstractive captioning of infographics
by: Landman, Nathan, M. Eng. Massachusetts Institute of Technology
Published: (2018)

Wing in ground effect craft /
by: 280223 Loh, Sheau Ping
Published: (1999)

Ground effect modelling analysis (ground effects related to F1 front wing and rear wing) /
by: 410523 Hiew, Sau Fung, et al.
Published: (2007)

Discussion on Disaster Remote Sensing Application Service Based on Ground Information Port
by: Qiao HU, et al.
Published: (2020-12-01)

Discussion on Disaster Remote Sensing Application Service Based on Ground Information Port
by: Qiao HU, et al.
Published: (2020-12-01)

Adaptive Curriculum Learning for Video Captioning
by: Shanhao Li, et al.
Published: (2022-01-01)

Information mining and similarity computation for semi- / un-structured sentences from the social data
by: Peiying Zhang, et al.
Published: (2021-11-01)