Voicevector: multimodal enrolment vectors for speaker separation

We present a transformer-based architecture for voice separation of a target speaker from multiple other speakers and ambient noise. We achieve this by using two separate neural networks: (A) An enrolment network designed to craft speakerspecific embeddings, exploiting various combinations of audio...

Full description

Bibliographic Details
Main Authors:	Rahimi, A, Afouras, T, Zisserman, A
Format:	Conference item
Language:	English
Published:	IEEE 2024

Voicevector: multimodal enrolment vectors for speaker separation

Similar Items