SCAN: Learning speaker identity from noisy sensor data

Sensor data acquired from multiple sensors simultaneously is featuring increasingly in our evermore pervasive world. Buildings can be made smarter and more ecient, spaces more responsive to users. A fundamental building block towards smart spaces is the ability to understand who is present in a cer...

Full description

Bibliographic Details
Main Authors:	Lu, C, Wen, H, Wang, S, Markham, A, Trigoni, N
Format:	Conference item
Published:	Association for Computing Machinery 2017

_version_	1797092967335329792
author	Lu, C Wen, H Wang, S Markham, A Trigoni, N
author_facet	Lu, C Wen, H Wang, S Markham, A Trigoni, N
author_sort	Lu, C
collection	OXFORD
description	Sensor data acquired from multiple sensors simultaneously is featuring increasingly in our evermore pervasive world. Buildings can be made smarter and more ecient, spaces more responsive to users. A fundamental building block towards smart spaces is the ability to understand who is present in a certain area. A ubiquitous way of detecting this is to exploit the unique vocal features as people interact with one another. As an example, consider audio features sampled during a meeting, yielding a noisy set of possible voiceprints. With a number of meetings and knowledge of participation (e.g. through a calendar or MAC address), can we learn to associate a speci€c identity with a particular voiceprint? Obviously enrolling users into a biometric database is time-consuming and not robust to vocal deviations over time. To address this problem, the standard approach is to perform a clustering step (e.g. of audio data) followed by a data association step, when identity-rich sensor data is available. In this paper we show that this approach is not robust to noise in either type of sensor stream; to tackle this issue we propose a novel algorithm that jointly optimises the clustering and association process yielding up to three times higher identi€cation precision than approaches that execute these steps sequentially. We demonstrate the performance bene€ts of our approach in two case studies, one with acoustic and MAC datasets that we collected from meetings in a non-residential building, and another from an online dataset from recorded radio interviews.
first_indexed	2024-03-07T03:53:34Z
format	Conference item
id	oxford-uuid:c2197fa5-6412-4886-b92a-23aa4bd6a524
institution	University of Oxford
last_indexed	2024-03-07T03:53:34Z
publishDate	2017
publisher	Association for Computing Machinery
record_format	dspace
spelling	oxford-uuid:c2197fa5-6412-4886-b92a-23aa4bd6a5242022-03-27T06:06:27ZSCAN: Learning speaker identity from noisy sensor dataConference itemhttp://purl.org/coar/resource_type/c_5794uuid:c2197fa5-6412-4886-b92a-23aa4bd6a524Symplectic Elements at OxfordAssociation for Computing Machinery2017Lu, CWen, HWang, SMarkham, ATrigoni, NSensor data acquired from multiple sensors simultaneously is featuring increasingly in our evermore pervasive world. Buildings can be made smarter and more ecient, spaces more responsive to users. A fundamental building block towards smart spaces is the ability to understand who is present in a certain area. A ubiquitous way of detecting this is to exploit the unique vocal features as people interact with one another. As an example, consider audio features sampled during a meeting, yielding a noisy set of possible voiceprints. With a number of meetings and knowledge of participation (e.g. through a calendar or MAC address), can we learn to associate a speci€c identity with a particular voiceprint? Obviously enrolling users into a biometric database is time-consuming and not robust to vocal deviations over time. To address this problem, the standard approach is to perform a clustering step (e.g. of audio data) followed by a data association step, when identity-rich sensor data is available. In this paper we show that this approach is not robust to noise in either type of sensor stream; to tackle this issue we propose a novel algorithm that jointly optimises the clustering and association process yielding up to three times higher identi€cation precision than approaches that execute these steps sequentially. We demonstrate the performance bene€ts of our approach in two case studies, one with acoustic and MAC datasets that we collected from meetings in a non-residential building, and another from an online dataset from recorded radio interviews.
spellingShingle	Lu, C Wen, H Wang, S Markham, A Trigoni, N SCAN: Learning speaker identity from noisy sensor data
title	SCAN: Learning speaker identity from noisy sensor data
title_full	SCAN: Learning speaker identity from noisy sensor data
title_fullStr	SCAN: Learning speaker identity from noisy sensor data
title_full_unstemmed	SCAN: Learning speaker identity from noisy sensor data
title_short	SCAN: Learning speaker identity from noisy sensor data
title_sort	scan learning speaker identity from noisy sensor data
work_keys_str_mv	AT luc scanlearningspeakeridentityfromnoisysensordata AT wenh scanlearningspeakeridentityfromnoisysensordata AT wangs scanlearningspeakeridentityfromnoisysensordata AT markhama scanlearningspeakeridentityfromnoisysensordata AT trigonin scanlearningspeakeridentityfromnoisysensordata

SCAN: Learning speaker identity from noisy sensor data

Similar Items