Voicevector: multimodal enrolment vectors for speaker separation

We present a transformer-based architecture for voice separation of a target speaker from multiple other speakers and ambient noise. We achieve this by using two separate neural networks: (A) An enrolment network designed to craft speakerspecific embeddings, exploiting various combinations of audio...

Full description

Bibliographic Details
Main Authors: Rahimi, A, Afouras, T, Zisserman, A
Format: Conference item
Language:English
Published: IEEE 2024