Modeling Long-Term Multimodal Representations for Active Speaker Detection With Spatio-Positional Encoder
In this study, we present an end-to-end framework for active speaker detection to achieve robust performance in challenging scenarios with multiple speakers. In contrast to recent approaches, which rely heavily on the visual relational context between all speakers in a video frame, we propose collab...
Main Authors: | Minyoung Kyoung, Hwa Jeon Song |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2023-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10287283/ |
Similar Items
-
Tracking Object-State Representations During Real-Time Language Comprehension by Native and Non-native Speakers of English
by: Xin Kang, et al.
Published: (2022-03-01) -
Speech Enhancement for Multimodal Speaker Diarization System
by: Rehan Ahmad, et al.
Published: (2020-01-01) -
Residual Information in Deep Speaker Embedding Architectures
by: Adriana Stan
Published: (2022-10-01) -
Speaker Diarization and Identification From Single Channel Classroom Audio Recordings Using Virtual Microphones
by: Antonio Gomez, et al.
Published: (2022-01-01) -
Speaker Recognition: Progression and challenges
by: Yusra Al-Irahyim, et al.
Published: (2021-09-01)