Utterance-level aggregation for speaker recognition in the wild

The objective of this paper is speaker recognition `in the wild' - where utterances may be of variable length and also contain irrelevant signals. Crucial elements in the design of deep networks for this task are the type of trunk (frame level) network, and the method of temporal aggregation. W...

Disgrifiad llawn

Manylion Llyfryddiaeth
Prif Awduron: Xie, W, Nagrani, A, Chung, J, Zisserman, A
Fformat: Conference item
Cyhoeddwyd: IEEE 2019

Eitemau Tebyg