Overcoming resource limitations in the processing of unlimited speech : applications to speaker and language recognition

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016.

Bibliographic Details
Main Author: Shum, Stephen (Stephen Hin-Chung)
Other Authors: James R. Glass and Najim Dehak.
Format: Thesis
Language:eng
Published: Massachusetts Institute of Technology 2016
Subjects:
Online Access:http://hdl.handle.net/1721.1/105952
_version_ 1811083399917469696
author Shum, Stephen (Stephen Hin-Chung)
author2 James R. Glass and Najim Dehak.
author_facet James R. Glass and Najim Dehak.
Shum, Stephen (Stephen Hin-Chung)
author_sort Shum, Stephen (Stephen Hin-Chung)
collection MIT
description Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016.
first_indexed 2024-09-23T12:32:27Z
format Thesis
id mit-1721.1/105952
institution Massachusetts Institute of Technology
language eng
last_indexed 2024-09-23T12:32:27Z
publishDate 2016
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1059522019-04-12T17:27:27Z Overcoming resource limitations in the processing of unlimited speech : applications to speaker and language recognition Shum, Stephen (Stephen Hin-Chung) James R. Glass and Najim Dehak. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016. This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. Cataloged from student-submitted PDF version of thesis. Includes bibliographical references (pages 139-149). We live an era with almost unlimited access to data. Yet without their proper tagging and annotation, we often struggle to make eective use of most of it. And sometimes, the labels we have access to are not even the ones we really need for the task at hand. Asking human experts for input can be time-consuming and expensive, thus bringing to bear a need for better ways to handle and process unlabeled data. In particular, successful methods in unsupervised domain adaptation can automatically recognize and adapt existing algorithms to systematic changes in the input. Furthermore, methods that can organize incoming streams of information can allow us to derive insights with minimal manual labeling effort - this is the notion of weakly supervised learning. In this thesis, we explore these two themes in the context of speaker and language recognition. First, we consider the problem of adapting an existing algorithm for speaker recognition to a systematic change in our input domain. Then we undertake the scenario in which we start with only unlabeled data and are allowed to select a subset of examples to be labeled, with the goal of minimizing the number of actively labeled examples needed to achieve acceptable speaker recognition performance. Turning to language recognition, we aim to decrease our reliance on transcribed speech via the use of a large-scale model for discovering sub-word units from multilingual data in an unsupervised manner. In doing so, we observe the impact of even small bits of linguistic knowledge and use this as inspiration to improve our sub-word unit discovery methods via the use of weak, pronunciation-equivalent constraints. by Stephen H. Shum. Ph. D. 2016-12-22T15:16:13Z 2016-12-22T15:16:13Z 2016 2016 Thesis http://hdl.handle.net/1721.1/105952 965383477 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 149 pages application/pdf Massachusetts Institute of Technology
spellingShingle Electrical Engineering and Computer Science.
Shum, Stephen (Stephen Hin-Chung)
Overcoming resource limitations in the processing of unlimited speech : applications to speaker and language recognition
title Overcoming resource limitations in the processing of unlimited speech : applications to speaker and language recognition
title_full Overcoming resource limitations in the processing of unlimited speech : applications to speaker and language recognition
title_fullStr Overcoming resource limitations in the processing of unlimited speech : applications to speaker and language recognition
title_full_unstemmed Overcoming resource limitations in the processing of unlimited speech : applications to speaker and language recognition
title_short Overcoming resource limitations in the processing of unlimited speech : applications to speaker and language recognition
title_sort overcoming resource limitations in the processing of unlimited speech applications to speaker and language recognition
topic Electrical Engineering and Computer Science.
url http://hdl.handle.net/1721.1/105952
work_keys_str_mv AT shumstephenstephenhinchung overcomingresourcelimitationsintheprocessingofunlimitedspeechapplicationstospeakerandlanguagerecognition