Natural-Language-Driven Multimodal Representation Learning for Audio-Visual Scene-Aware Dialog System
With the development of multimedia systems in wireless environments, the rising need for artificial intelligence is to design a system that can properly communicate with humans with a comprehensive understanding of various types of information in a human-like manner. Therefore, this paper addresses...
Main Authors: | Yoonseok Heo, Sangwoo Kang, Jungyun Seo |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-09-01
|
Series: | Sensors |
Subjects: | |
Online Access: | https://www.mdpi.com/1424-8220/23/18/7875 |
Similar Items
-
A Simple Framework for Scene Graph Reasoning with Semantic Understanding of Complex Sentence Structure
by: Yoonseok Heo, et al.
Published: (2023-08-01) -
ANALYSIS OF MULTIMODAL FUSION TECHNIQUES FOR AUDIO-VISUAL SPEECH RECOGNITION
by: D.V. Ivanko, et al.
Published: (2016-05-01) -
Multimodal Prompt Learning in Emotion Recognition Using Context and Audio Information
by: Eunseo Jeong, et al.
Published: (2023-06-01) -
Impact of Video Compression and Multimodal Embedding on Scene Description
by: Jin Young Lee
Published: (2019-08-01) -
Deep Multimodal Representation Learning: A Survey
by: Wenzhong Guo, et al.
Published: (2019-01-01)