Contrasting Multi-Modal Similarity Framework for Video Scene Segmentation
This paper proposes a video scene segmentation framework referred to as a Contrasting Multi-Modal Similarity (CMS). Video is composed of multiple scenes which are short stories or semantic units of video, with each scene consisting of multiple shots. The task of video scene segmentation aims to sema...
Main Authors: | Jinwoo Park, Jungeun Kim, Jaegwang Seok, Sukhyun Lee, Junyeong Kim |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2024-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10445461/ |
Similar Items
-
Tencent AVS: A Holistic Ads Video Dataset for Multi-Modal Scene Segmentation
by: Jie Jiang, et al.
Published: (2022-01-01) -
CMANet: Cross-Modality Attention Network for Indoor-Scene Semantic Segmentation
by: Longze Zhu, et al.
Published: (2022-11-01) -
Lightweight dense video captioning with cross-modal attention and knowledge-enhanced unbiased scene graph
by: Shixing Han, et al.
Published: (2023-02-01) -
Dynamic Debiasing Network for Visual Commonsense Generation
by: Jungeun Kim, et al.
Published: (2023-01-01) -
SpatialScene2Vec: A self-supervised contrastive representation learning method for spatial scene similarity evaluation
by: Danhuai Guo, et al.
Published: (2024-04-01)