Natural-Language-Driven Multimodal Representation Learning for Audio-Visual Scene-Aware Dialog System

Natural-Language-Driven Multimodal Representation Learning for Audio-Visual Scene-Aware Dialog System

With the development of multimedia systems in wireless environments, the rising need for artificial intelligence is to design a system that can properly communicate with humans with a comprehensive understanding of various types of information in a human-like manner. Therefore, this paper addresses...

Full description

Bibliographic Details
Main Authors:	Yoonseok Heo, Sangwoo Kang, Jungyun Seo
Format:	Article
Language:	English
Published:	MDPI AG 2023-09-01
Series:	Sensors
Subjects:	multimodal deep learning audio-visual scene-aware dialog system event keyword driven multimodal representation learning
Online Access:	https://www.mdpi.com/1424-8220/23/18/7875

Similar Items

A Simple Framework for Scene Graph Reasoning with Semantic Understanding of Complex Sentence Structure
by: Yoonseok Heo, et al.
Published: (2023-08-01)

ANALYSIS OF MULTIMODAL FUSION TECHNIQUES FOR AUDIO-VISUAL SPEECH RECOGNITION
by: D.V. Ivanko, et al.
Published: (2016-05-01)

Multimodal Prompt Learning in Emotion Recognition Using Context and Audio Information
by: Eunseo Jeong, et al.
Published: (2023-06-01)

Impact of Video Compression and Multimodal Embedding on Scene Description
by: Jin Young Lee
Published: (2019-08-01)

Deep Multimodal Representation Learning: A Survey
by: Wenzhong Guo, et al.
Published: (2019-01-01)

Multimodal Sensor-Input Architecture with Deep Learning for Audio-Visual Speech Recognition in Wild
by: Yibo He, et al.
Published: (2023-02-01)

Multi-View Attention Network for Visual Dialog
by: Sungjin Park, et al.
Published: (2021-03-01)

Multimodal Fusion Remote Sensing Image–Audio Retrieval
by: Rui Yang, et al.
Published: (2022-01-01)

Automatic Spatial Audio Scene Classification in Binaural Recordings of Music
by: Sławomir K. Zieliński, et al.
Published: (2019-04-01)

Editorial: Advances in multimodal learning: pedagogies, technologies, and analytics
by: Heng Luo
Published: (2023-10-01)

MFVC: Urban Traffic Scene Video Caption Based on Multimodal Fusion
by: Mingxing Li, et al.
Published: (2022-09-01)

PolSAR Scene Classification via Low-Rank Constrained Multimodal Tensor Representation
by: Bo Ren, et al.
Published: (2022-06-01)

Learning Audio–Sheet Music Correspondences for Cross-Modal Retrieval and Piece Identification
by: Matthias Dorfer, et al.
Published: (2018-09-01)

A multimodal dialog approach to mental state characterization in clinically depressed, anxious, and suicidal populations
by: Joshua Cohen, et al.
Published: (2023-09-01)

“Passando a boiada”: aspectos dialógicos e interdiscursivos em textos relacionados ao discurso do Ministro do Meio Ambiente Ricardo Salles / “Passando a boiada”: dialogical and interdiscursive aspects in texts related to the speech of The Minister of the Environment Ricardo Salles
by: Camila Belizário Ribeiro, et al.
Published: (2021-07-01)

Video Scene Segmentation of TV Series Using Multimodal Neural Features
by: Aman Berhe, et al.
Published: (2019-07-01)

Detection of Important Scenes in Baseball Videos via a Time-Lag-Aware Multimodal Variational Autoencoder
by: Kaito Hirasawa, et al.
Published: (2021-03-01)

Modeling Japanese Praising Behavior by Analyzing Audio and Visual Behaviors
by: Toshiki Onishi, et al.
Published: (2022-03-01)

Multimodal fall detection for solitary individuals based on audio-video decision fusion processing
by: Shiqin Jiao, et al.
Published: (2024-04-01)

Miscommunication handling in spoken dialog systems based on error-aware dialog state detection
by: Chung-Hsien Wu, et al.
Published: (2017-05-01)

A Multimodal Late Fusion Framework for Physiological Sensor and Audio-Signal-Based Stress Detection: An Experimental Study and Public Dataset
by: Vasileios-Rafail Xefteris, et al.
Published: (2023-12-01)

Reading Multimodal Texts for Learning – a Model for Cultivating Multimodal Literacy
by: Kristina Danielsson, et al.
Published: (2016-08-01)

A Comparison of Human against Machine-Classification of Spatial Audio Scenes in Binaural Recordings of Music
by: Sławomir K. Zieliński, et al.
Published: (2020-08-01)

Convolution-Based Encoding of Depth Images for Transfer Learning in RGB-D Scene Classification
by: Radhakrishnan Gopalapillai, et al.
Published: (2021-11-01)

A multimodal fusion framework for urban scene understanding and functional identification using geospatial data
by: Chen Su, et al.
Published: (2024-03-01)

Promoting Dialogic Action through the Expansion of English Language Learners’ Communicative Repertoires
by: John Steven Gómez-Giraldo
Published: (2022-02-01)

To Make the Voice Heard
by: Spencer Roberts
Published: (2022-10-01)

Multimodal AutoML via Representation Evolution
by: Blaž Škrlj, et al.
Published: (2022-12-01)

Comprehensive Context Recognizer Based on Multimodal Sensors in a Smartphone
by: Sungyoung Lee, et al.
Published: (2012-09-01)

Spatial Audio Scene Characterization (SASC): Automatic Localization of Front-, Back-, Up-, and Down-Positioned Music Ensembles in Binaural Recordings
by: Sławomir K. Zieliński, et al.
Published: (2022-02-01)

Multimodal Sentiment Analysis Representations Learning via Contrastive Learning with Condense Attention Fusion
by: Huiru Wang, et al.
Published: (2023-03-01)

Stratégies d'un apprenant de langue dans une formation en ligne sur une plate-forme audio-synchrone
by: Laurence Jeannot, et al.

Modeling Long-Term Multimodal Representations for Active Speaker Detection With Spatio-Positional Encoder
by: Minyoung Kyoung, et al.
Published: (2023-01-01)

Learning long-term filter banks for audio source separation and audio scene classification
by: Teng Zhang, et al.
Published: (2018-05-01)

A Robust Approach to Multimodal Deepfake Detection
by: Davide Salvi, et al.
Published: (2023-06-01)

Multimodal cohesion and viewers' comprehension of scene transitions in film: an empirical investigation
by: Dayana Markhabayeva, et al.
Published: (2024-03-01)

The meaning of Photosynthesis acquired by Primary School Kids from an investigative activity using multimodal representation
by: Andréia de Freitas Zompero, et al.
Published: (2011-08-01)

Multimodality and children’s participation in classrooms: Instances of research
by: Denise Newfield
Published: (2011-03-01)

Multimodality and children’s participation in classrooms: Instances of research
by: Denise Newfield
Published: (2011-03-01)

Multimodal Data Fusion in Learning Analytics: A Systematic Review
by: Su Mu, et al.
Published: (2020-11-01)