Modeling Long-Term Multimodal Representations for Active Speaker Detection With Spatio-Positional Encoder

Modeling Long-Term Multimodal Representations for Active Speaker Detection With Spatio-Positional Encoder

In this study, we present an end-to-end framework for active speaker detection to achieve robust performance in challenging scenarios with multiple speakers. In contrast to recent approaches, which rely heavily on the visual relational context between all speakers in a video frame, we propose collab...

Full description

Bibliographic Details
Main Authors:	Minyoung Kyoung, Hwa Jeon Song
Format:	Article
Language:	English
Published:	IEEE 2023-01-01
Series:	IEEE Access
Subjects:	Active speaker detection audio-visual multimodal representations multi-speaker
Online Access:	https://ieeexplore.ieee.org/document/10287283/

Similar Items

Tracking Object-State Representations During Real-Time Language Comprehension by Native and Non-native Speakers of English
by: Xin Kang, et al.
Published: (2022-03-01)

Speech Enhancement for Multimodal Speaker Diarization System
by: Rehan Ahmad, et al.
Published: (2020-01-01)

Residual Information in Deep Speaker Embedding Architectures
by: Adriana Stan
Published: (2022-10-01)

Speaker Diarization and Identification From Single Channel Classroom Audio Recordings Using Virtual Microphones
by: Antonio Gomez, et al.
Published: (2022-01-01)

Speaker Recognition: Progression and challenges
by: Yusra Al-Irahyim, et al.
Published: (2021-09-01)

Local Control of Audio Environment: A Review of Methods and Applications
by: Jussi Kuutti, et al.
Published: (2014-02-01)

An Automatic Speaker Clustering Pipeline for the Air Traffic Communication Domain
by: Driss Khalil, et al.
Published: (2023-10-01)

Analysis of transition cost and model parameters in speaker diarization for meetings
by: Beatriz Martínez-González, et al.
Published: (2021-02-01)

Speaker Recognition in Uncontrolled Environment: A Review
by: Karamangala Narendra, et al.
Published: (2013-03-01)

Inaudible Attack on AI Speakers
by: Seyitmammet Saparmammedovich Alchekov, et al.
Published: (2023-04-01)

A Survey on Text-Dependent and Text-Independent Speaker Verification
by: Youzhi Tu, et al.
Published: (2022-01-01)

One-Shot Voice Conversion Algorithm Based on Representations Separation
by: Chunhui Deng, et al.
Published: (2020-01-01)

Becoming IELTS Examiners: Demystifying Native-Speakerism in the Area of English Language Testing
by: Pritz Hutabarat
Published: (2022-10-01)

Speaker Recognition Systems in the Last Decade – A Survey
by: Ahmed M. Ahmed, et al.
Published: (2021-03-01)

Speaker-turn aware diarization for speech-based cognitive assessments
by: Sean Shensheng Xu, et al.
Published: (2024-01-01)

Self Attention Networks in Speaker Recognition
by: Pooyan Safari, et al.
Published: (2023-05-01)

Speaker diarization and speech recognition in the semi-automatization of audio description: An exploratory study on future possibilities?
by: Héctor Delgado, et al.
Published: (2015-06-01)

A high-performance text-independent speaker identification of Arabic speakers using a CHMM-based approach
by: Hesham Tolba
Published: (2011-03-01)

An Experimental Comparison of Modeling Techniques and Combination of Speaker – Specific Information from Different Languages for Multilingual Speaker Identification
by: Jayanna H.S., et al.
Published: (2016-10-01)

Our speaker this evening : practical etiquatte manual for masters of ceremonies, committee chairmen and pastors /
by: 223722 Markley, Kenneth A.
Published: (1974)

Global–Local Self-Attention Based Transformer for Speaker Verification
by: Fei Xie, et al.
Published: (2022-10-01)

Laplacian Operator as Speaker Identification Parameter
by: S. K. Jamil
Published: (2009-12-01)

Discriminatory Practices Against Non-Native English Speaker Teachers in Colombia’s Language Centers: A Multimodal Study
by: Adriana Montoya, et al.
Published: (2024-01-01)

On the native/nonnative speaker notion and World Englishes: Debating with K. Rajagopalan
by: John Robert SCHMITZ

Restricted Boltzmann Machine Vectors for Speaker Clustering and Tracking Tasks in TV Broadcast Shows
by: Umair Khan, et al.
Published: (2019-07-01)

Speaking with a KN95 face mask: a within-subjects study on speaker adaptation and strategies to improve intelligibility
by: Sarah E. Gutz, et al.
Published: (2022-07-01)

Enhanced Multiple Speakers’ Separation and Identification for VOIP Applications Using Deep Learning
by: Amira A. Mohamed, et al.
Published: (2023-03-01)

Deconstructing the Native Speaker: Further Evidence From Heritage Speakers for Why This Horse Should Be Dead!
by: Wintai Tsehaye, et al.
Published: (2021-10-01)

High-Level CNN and Machine Learning Methods for Speaker Recognition
by: Giovanni Costantini, et al.
Published: (2023-03-01)

Native-speakerism and the complexity of personal experience: A duoethnographic study
by: Robert J. Lowe, et al.
Published: (2016-12-01)

A Study of Avoidance Strategy of Face Threat of Native Speaker and Non-Native Speaker by Using Goffman’s Face-Work Theory
by: Salmon Pandarangga
Published: (2016-05-01)

Robust Deep Speaker Recognition: Learning Latent Representation with Joint Angular Margin Loss
by: Labib Chowdhury, et al.
Published: (2020-10-01)

Speaker Recognition Based on Long-Term Acoustic Features With Analysis Sparse Representation
by: Ting Lin, et al.
Published: (2019-01-01)

An investigation of Filipino ESL learners’ language stereotypes toward Philippine lectal speakers using a Matched Guise Test
by: Henelsie B. Mendoza
Published: (2020-10-01)

Comparison of Modern Deep Learning Models for Speaker Verification
by: Vitalii Brydinskyi, et al.
Published: (2024-02-01)

New speakers versus old speakers. O akwizycji języka niemieckiego dwóch pokoleń na Mazurach
by: Anna Jorroch
Published: (2022-05-01)

Speech Characteristics of Japanese Speakers Affecting American and Japanese Listener Evaluations
by: Atsuko Kashiwagi, et al.
Published: (2015-06-01)

Audio-Visual Activity Guided Cross-Modal Identity Association for Active Speaker Detection
by: Rahul Sharma, et al.
Published: (2023-01-01)

Multi Speaker Natural Speech Synthesis Using Generative Flows
by: Dmitry Obukhov
Published: (2021-12-01)

Characterizations of native speakers by language teachers and students of Japanese and Chinese in the U.S.
by: Shinsuke Tsuchiya
Published: (2018-03-01)