Multisensory Fusion for Unsupervised Spatiotemporal Speaker Diarization

Speaker diarization consists of answering the question of “who spoke when” in audio recordings. In meeting scenarios, the task of labeling audio with the corresponding speaker identities can be further assisted by the exploitation of spatial features. This work proposes a framework designed to asses...

Full description

Bibliographic Details
Main Authors:	Paris Xylogiannis, Nikolaos Vryzas, Lazaros Vrysis, Charalampos Dimoulas
Format:	Article
Language:	English
Published:	MDPI AG 2024-06-01
Series:	Sensors
Subjects:	speaker diarization sound localization AI-enabled systems multimodal decision making deep learning smartphones
Online Access:	https://www.mdpi.com/1424-8220/24/13/4229

Internet

https://www.mdpi.com/1424-8220/24/13/4229

Multisensory Fusion for Unsupervised Spatiotemporal Speaker Diarization

Internet

Similar Items