Audio-Visual Overlapped Speech Detection for Spontaneous Distant Speech

Audio-Visual Overlapped Speech Detection for Spontaneous Distant Speech

Although advances in deep learning have brought remarkable improvements to Overlapped Speech Detection (OSD), the performance in far-field environments is still limited owing to the lack of real-world overlapped speech and a low signal-to-noise ratio. In this paper, we present an end-to-end audiovis...

Full description

Bibliographic Details
Main Authors:	Minyoung Kyoung, Hyungbae Jeon, Kiyoung Park
Format:	Article
Language:	English
Published:	IEEE 2023-01-01
Series:	IEEE Access
Subjects:	Overlapped speech detection far-field audio data augmentation audiovisual speech processing multimodal deep learning
Online Access:	https://ieeexplore.ieee.org/document/10064301/

Similar Items

ANALYSIS OF MULTIMODAL FUSION TECHNIQUES FOR AUDIO-VISUAL SPEECH RECOGNITION
by: D.V. Ivanko, et al.
Published: (2016-05-01)

Noise-Robust Multimodal Audio-Visual Speech Recognition System for Speech-Based Interaction Applications
by: Sanghun Jeon, et al.
Published: (2022-10-01)

Multimodal Sensor-Input Architecture with Deep Learning for Audio-Visual Speech Recognition in Wild
by: Yibo He, et al.
Published: (2023-02-01)

Separation of Audio-Visual Speech Sources: A New Approach Exploiting the Audio-Visual Coherence of Speech Stimuli
by: Sodoyer David, et al.
Published: (2002-01-01)

TIMIT-TTS: A Text-to-Speech Dataset for Multimodal Synthetic Media Detection
by: Davide Salvi, et al.
Published: (2023-01-01)

Atypical Audio-visual speech perception and McGurk effects in children with Specific Language Impairment
by: Jacqueline eLeybaert, et al.
Published: (2014-05-01)

Research on Robust Audio-Visual Speech Recognition Algorithms
by: Wenfeng Yang, et al.
Published: (2023-04-01)

How is the McGurk effect modulated by Cued Speech in deaf and hearing adults ?
by: Clémence eBayard, et al.
Published: (2014-05-01)

Audio-Visual Speech Recognition Using MPEG-4 Compliant Visual Features
by: Aleksic Petar S, et al.
Published: (2002-01-01)

Top-Down Predictions of Familiarity and Congruency in Audio-Visual Speech Perception at Neural Level
by: Orsolya B. Kolozsvári, et al.
Published: (2019-07-01)

Development of the Mechanisms Underlying Audiovisual Speech Perception Benefit
by: Kaylah Lalonde, et al.
Published: (2021-01-01)

Erratum: Neural entrainment to rhythmically-presented auditory, visual and audio-visual speech in children
by: Alan James Power, et al.
Published: (2013-12-01)

Moving-Talker, Speaker-Independent Feature Study, and Baseline Results Using the CUAVE Multimodal Speech Corpus
by: Patterson Eric K, et al.
Published: (2002-01-01)

Do gender differences in audio-visual benefit and visual influence in audio-visual speech perception emerge with age?
by: Magnus eAlm, et al.
Published: (2015-07-01)

APPLICATION OF PARTIAL LEAST SQUARES REGRESSION FOR AUDIO-VISUAL SPEECH PROCESSING AND MODELING
by: A. L. Oleinik
Published: (2015-09-01)

Towards audio codec-based speech separation
by: Yip, Jia Qi, et al.
Published: (2024)

Speech Emotion Recognition Using Audio Matching
by: Iti Chaturvedi, et al.
Published: (2022-11-01)

A Facial Feature and Lip Movement Enhanced Audio-Visual Speech Separation Model
by: Guizhu Li, et al.
Published: (2023-10-01)

Frequency, Time, Representation and Modeling Aspects for Major Speech and Audio Processing Applications
by: Juraj Kacur, et al.
Published: (2022-08-01)

Integrative interaction of emotional speech in audio-visual modality
by: Haibin Dong, et al.
Published: (2022-11-01)

Sources of Confusion in Infant Audiovisual Speech Perception Research
by: Kathleen Elizabeth Shaw, et al.
Published: (2015-12-01)

Perceptual Doping: A Hypothesis on How Early Audiovisual Speech Stimulation Enhances Subsequent Auditory Speech Processing
by: Shahram Moradi, et al.
Published: (2023-04-01)

Multimodal prosody: gestures and speech in the perception of prominence in Spanish
by: Miguel Jiménez-Bravo, et al.
Published: (2024-03-01)

Hearing, seeing, and feeling speech: the neurophysiological correlates of trimodal speech perception
by: Doreen Hansmann, et al.
Published: (2023-08-01)

The effect of combined sensory and semantic components on audio-visual speech perception in older adults
by: Corrina eMaguinness, et al.
Published: (2011-12-01)

Language Identification-Based Evaluation of Single Channel Speech Separation of Overlapped Speeches
by: Zuhragvl Aysa, et al.
Published: (2022-10-01)

Single-Channel Speech Enhancement Techniques for Distant Speech Recognition
by: Ashwini Jaya Kumar, et al.
Published: (2013-06-01)

Multi-Angle Lipreading with Angle Classification-Based Feature Extraction and Its Application to Audio-Visual Speech Recognition
by: Shinnosuke Isobe, et al.
Published: (2021-07-01)

A Review of Recent Advances on Deep Learning Methods for Audio-Visual Speech Recognition
by: Denis Ivanko, et al.
Published: (2023-06-01)

Assessing the efficacy of benchmarks for automatic speech accent recognition
by: Benjamin Bock, et al.
Published: (2015-08-01)

KMSAV: Korean multi-speaker spontaneous audiovisual dataset
by: Kiyoung Park, et al.
Published: (2024-02-01)

Distant speech recognition /
by: 392944 Wolfel, Matthias, et al.
Published: (2009)

Multimodal Unsupervised Speech Translation for Recognizing and Evaluating Second Language Speech
by: Yun Kyung Lee, et al.
Published: (2021-03-01)

Contributions of local speech encoding and functional connectivity to audio-visual speech perception
by: Bruno L Giordano, et al.
Published: (2017-06-01)

Multimodal audiovisual speech recognition architecture using a three-feature multi-fusion method for noise-robust systems
by: Sanghun Jeon, et al.
Published: (2024-02-01)

Improvement of Acoustic Models Fused with Lip Visual Information for Low-Resource Speech
by: Chongchong Yu, et al.
Published: (2023-02-01)

Future Speech Interfaces with Sensors and Machine Intelligence
by: Bruce Denby, et al.
Published: (2023-02-01)

Noise Adaptive Stream Weighting in Audio-Visual Speech Recognition
by: Berthommier Frédéric, et al.
Published: (2002-01-01)

Perceptual Evaluation of Speech Quality for Inexpensive Recording Equipment
by: Anas Hashmi
Published: (2021-03-01)

Multimodal Corpus Design for Audio-Visual Speech Recognition in Vehicle Cabin
by: Alexey Kashevnik, et al.
Published: (2021-01-01)