Overcoming resource limitations in the processing of unlimited speech : applications to speaker and language recognition
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016.
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Language: | eng |
Published: |
Massachusetts Institute of Technology
2016
|
Subjects: | |
Online Access: | http://hdl.handle.net/1721.1/105952 |
_version_ | 1811083399917469696 |
---|---|
author | Shum, Stephen (Stephen Hin-Chung) |
author2 | James R. Glass and Najim Dehak. |
author_facet | James R. Glass and Najim Dehak. Shum, Stephen (Stephen Hin-Chung) |
author_sort | Shum, Stephen (Stephen Hin-Chung) |
collection | MIT |
description | Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016. |
first_indexed | 2024-09-23T12:32:27Z |
format | Thesis |
id | mit-1721.1/105952 |
institution | Massachusetts Institute of Technology |
language | eng |
last_indexed | 2024-09-23T12:32:27Z |
publishDate | 2016 |
publisher | Massachusetts Institute of Technology |
record_format | dspace |
spelling | mit-1721.1/1059522019-04-12T17:27:27Z Overcoming resource limitations in the processing of unlimited speech : applications to speaker and language recognition Shum, Stephen (Stephen Hin-Chung) James R. Glass and Najim Dehak. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016. This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. Cataloged from student-submitted PDF version of thesis. Includes bibliographical references (pages 139-149). We live an era with almost unlimited access to data. Yet without their proper tagging and annotation, we often struggle to make eective use of most of it. And sometimes, the labels we have access to are not even the ones we really need for the task at hand. Asking human experts for input can be time-consuming and expensive, thus bringing to bear a need for better ways to handle and process unlabeled data. In particular, successful methods in unsupervised domain adaptation can automatically recognize and adapt existing algorithms to systematic changes in the input. Furthermore, methods that can organize incoming streams of information can allow us to derive insights with minimal manual labeling effort - this is the notion of weakly supervised learning. In this thesis, we explore these two themes in the context of speaker and language recognition. First, we consider the problem of adapting an existing algorithm for speaker recognition to a systematic change in our input domain. Then we undertake the scenario in which we start with only unlabeled data and are allowed to select a subset of examples to be labeled, with the goal of minimizing the number of actively labeled examples needed to achieve acceptable speaker recognition performance. Turning to language recognition, we aim to decrease our reliance on transcribed speech via the use of a large-scale model for discovering sub-word units from multilingual data in an unsupervised manner. In doing so, we observe the impact of even small bits of linguistic knowledge and use this as inspiration to improve our sub-word unit discovery methods via the use of weak, pronunciation-equivalent constraints. by Stephen H. Shum. Ph. D. 2016-12-22T15:16:13Z 2016-12-22T15:16:13Z 2016 2016 Thesis http://hdl.handle.net/1721.1/105952 965383477 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 149 pages application/pdf Massachusetts Institute of Technology |
spellingShingle | Electrical Engineering and Computer Science. Shum, Stephen (Stephen Hin-Chung) Overcoming resource limitations in the processing of unlimited speech : applications to speaker and language recognition |
title | Overcoming resource limitations in the processing of unlimited speech : applications to speaker and language recognition |
title_full | Overcoming resource limitations in the processing of unlimited speech : applications to speaker and language recognition |
title_fullStr | Overcoming resource limitations in the processing of unlimited speech : applications to speaker and language recognition |
title_full_unstemmed | Overcoming resource limitations in the processing of unlimited speech : applications to speaker and language recognition |
title_short | Overcoming resource limitations in the processing of unlimited speech : applications to speaker and language recognition |
title_sort | overcoming resource limitations in the processing of unlimited speech applications to speaker and language recognition |
topic | Electrical Engineering and Computer Science. |
url | http://hdl.handle.net/1721.1/105952 |
work_keys_str_mv | AT shumstephenstephenhinchung overcomingresourcelimitationsintheprocessingofunlimitedspeechapplicationstospeakerandlanguagerecognition |