Overcoming resource limitations in the processing of unlimited speech : applications to speaker and language recognition

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016.

Bibliographic Details
Main Author:	Shum, Stephen (Stephen Hin-Chung)
Other Authors:	James R. Glass and Najim Dehak.
Format:	Thesis
Language:	eng
Published:	Massachusetts Institute of Technology 2016
Subjects:	Electrical Engineering and Computer Science.
Online Access:	http://hdl.handle.net/1721.1/105952

_version_	1811083399917469696
author	Shum, Stephen (Stephen Hin-Chung)
author2	James R. Glass and Najim Dehak.
author_facet	James R. Glass and Najim Dehak. Shum, Stephen (Stephen Hin-Chung)
author_sort	Shum, Stephen (Stephen Hin-Chung)
collection	MIT
description	Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016.
first_indexed	2024-09-23T12:32:27Z
format	Thesis
id	mit-1721.1/105952
institution	Massachusetts Institute of Technology
language	eng
last_indexed	2024-09-23T12:32:27Z
publishDate	2016
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/1059522019-04-12T17:27:27Z Overcoming resource limitations in the processing of unlimited speech : applications to speaker and language recognition Shum, Stephen (Stephen Hin-Chung) James R. Glass and Najim Dehak. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016. This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. Cataloged from student-submitted PDF version of thesis. Includes bibliographical references (pages 139-149). We live an era with almost unlimited access to data. Yet without their proper tagging and annotation, we often struggle to make eective use of most of it. And sometimes, the labels we have access to are not even the ones we really need for the task at hand. Asking human experts for input can be time-consuming and expensive, thus bringing to bear a need for better ways to handle and process unlabeled data. In particular, successful methods in unsupervised domain adaptation can automatically recognize and adapt existing algorithms to systematic changes in the input. Furthermore, methods that can organize incoming streams of information can allow us to derive insights with minimal manual labeling effort - this is the notion of weakly supervised learning. In this thesis, we explore these two themes in the context of speaker and language recognition. First, we consider the problem of adapting an existing algorithm for speaker recognition to a systematic change in our input domain. Then we undertake the scenario in which we start with only unlabeled data and are allowed to select a subset of examples to be labeled, with the goal of minimizing the number of actively labeled examples needed to achieve acceptable speaker recognition performance. Turning to language recognition, we aim to decrease our reliance on transcribed speech via the use of a large-scale model for discovering sub-word units from multilingual data in an unsupervised manner. In doing so, we observe the impact of even small bits of linguistic knowledge and use this as inspiration to improve our sub-word unit discovery methods via the use of weak, pronunciation-equivalent constraints. by Stephen H. Shum. Ph. D. 2016-12-22T15:16:13Z 2016-12-22T15:16:13Z 2016 2016 Thesis http://hdl.handle.net/1721.1/105952 965383477 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 149 pages application/pdf Massachusetts Institute of Technology
spellingShingle	Electrical Engineering and Computer Science. Shum, Stephen (Stephen Hin-Chung) Overcoming resource limitations in the processing of unlimited speech : applications to speaker and language recognition
title	Overcoming resource limitations in the processing of unlimited speech : applications to speaker and language recognition
title_full	Overcoming resource limitations in the processing of unlimited speech : applications to speaker and language recognition
title_fullStr	Overcoming resource limitations in the processing of unlimited speech : applications to speaker and language recognition
title_full_unstemmed	Overcoming resource limitations in the processing of unlimited speech : applications to speaker and language recognition
title_short	Overcoming resource limitations in the processing of unlimited speech : applications to speaker and language recognition
title_sort	overcoming resource limitations in the processing of unlimited speech applications to speaker and language recognition
topic	Electrical Engineering and Computer Science.
url	http://hdl.handle.net/1721.1/105952
work_keys_str_mv	AT shumstephenstephenhinchung overcomingresourcelimitationsintheprocessingofunlimitedspeechapplicationstospeakerandlanguagerecognition

Overcoming resource limitations in the processing of unlimited speech : applications to speaker and language recognition

Similar Items