Voicevector: multimodal enrolment vectors for speaker separation
We present a transformer-based architecture for voice separation of a target speaker from multiple other speakers and ambient noise. We achieve this by using two separate neural networks: (A) An enrolment network designed to craft speakerspecific embeddings, exploiting various combinations of audio...
Main Authors: | , , |
---|---|
Format: | Conference item |
Language: | English |
Published: |
IEEE
2024
|