Unsupervised speech processing with applications to query-by-example spoken term detection

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2013.

Bibliographic Details
Main Author: Zhang, Yaodong, Ph. D. Massachusetts Institute of Technology
Other Authors: James R. Glass.
Format: Thesis
Language:eng
Published: Massachusetts Institute of Technology 2013
Subjects:
Online Access:http://hdl.handle.net/1721.1/79217
_version_ 1811093376900005888
author Zhang, Yaodong, Ph. D. Massachusetts Institute of Technology
author2 James R. Glass.
author_facet James R. Glass.
Zhang, Yaodong, Ph. D. Massachusetts Institute of Technology
author_sort Zhang, Yaodong, Ph. D. Massachusetts Institute of Technology
collection MIT
description Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2013.
first_indexed 2024-09-23T15:44:16Z
format Thesis
id mit-1721.1/79217
institution Massachusetts Institute of Technology
language eng
last_indexed 2024-09-23T15:44:16Z
publishDate 2013
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/792172019-04-12T09:03:25Z Unsupervised speech processing with applications to query-by-example spoken term detection Zhang, Yaodong, Ph. D. Massachusetts Institute of Technology James R. Glass. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2013. Cataloged from PDF version of thesis. Includes bibliographical references (p. 163-173). This thesis is motivated by the challenge of searching and extracting useful information from speech data in a completely unsupervised setting. In many real world speech processing problems, obtaining annotated data is not cost and time effective. We therefore ask how much can we learn from speech data without any transcription. To address this question, in this thesis, we chose the query-by-example spoken term detection as a specific scenario to demonstrate that this task can be done in the unsupervised setting without any annotations. To build the unsupervised spoken term detection framework, we contributed three main techniques to form a complete working flow. First, we present two posteriorgram-based speech representations which enable speaker-independent, and noisy spoken term matching. The feasibility and effectiveness of both posteriorgram features are demonstrated through a set of spoken term detection experiments on different datasets. Second, we show two lower-bounding based methods for Dynamic Time Warping (DTW) based pattern matching algorithms. Both algorithms greatly outperform the conventional DTW in a single-threaded computing environment. Third, we describe the parallel implementation of the lower-bounded DTW search algorithm. Experimental results indicate that the total running time of the entire spoken detection system grows linearly with corpus size. We also present the training of large Deep Belief Networks (DBNs) on Graphical Processing Units (GPUs). The phonetic classification experiment on the TIMIT corpus showed a speed-up of 36x for pre-training and 45x for back-propagation for a two-layer DBN trained on the GPU platform compared to the CPU platform. by Yaodong Zhang. Ph.D. 2013-06-17T19:48:20Z 2013-06-17T19:48:20Z 2013 2013 Thesis http://hdl.handle.net/1721.1/79217 844753073 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 173 p. application/pdf Massachusetts Institute of Technology
spellingShingle Electrical Engineering and Computer Science.
Zhang, Yaodong, Ph. D. Massachusetts Institute of Technology
Unsupervised speech processing with applications to query-by-example spoken term detection
title Unsupervised speech processing with applications to query-by-example spoken term detection
title_full Unsupervised speech processing with applications to query-by-example spoken term detection
title_fullStr Unsupervised speech processing with applications to query-by-example spoken term detection
title_full_unstemmed Unsupervised speech processing with applications to query-by-example spoken term detection
title_short Unsupervised speech processing with applications to query-by-example spoken term detection
title_sort unsupervised speech processing with applications to query by example spoken term detection
topic Electrical Engineering and Computer Science.
url http://hdl.handle.net/1721.1/79217
work_keys_str_mv AT zhangyaodongphdmassachusettsinstituteoftechnology unsupervisedspeechprocessingwithapplicationstoquerybyexamplespokentermdetection