Multilingual techniques for low resource automatic speech recognition

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016.

Bibliographic Details
Main Author:	Chuangsuwanich, Ekapol
Other Authors:	James Glass.
Format:	Thesis
Language:	eng
Published:	Massachusetts Institute of Technology 2016
Subjects:	Electrical Engineering and Computer Science.
Online Access:	http://hdl.handle.net/1721.1/105571

_version_	1811078691501899776
author	Chuangsuwanich, Ekapol
author2	James Glass.
author_facet	James Glass. Chuangsuwanich, Ekapol
author_sort	Chuangsuwanich, Ekapol
collection	MIT
description	Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016.
first_indexed	2024-09-23T11:03:56Z
format	Thesis
id	mit-1721.1/105571
institution	Massachusetts Institute of Technology
language	eng
last_indexed	2024-09-23T11:03:56Z
publishDate	2016
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/1055712019-04-11T06:42:03Z Multilingual techniques for low resource automatic speech recognition Chuangsuwanich, Ekapol James Glass. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016. This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. Cataloged from student-submitted PDF version of thesis. Includes bibliographical references (pages [133]-143). Out of the approximately 7000 languages spoken around the world, there are only about 100 languages with Automatic Speech Recognition (ASR) capability. This is due to the fact that a vast amount of resources is required to build a speech recognizer. This often includes thousands of hours of transcribed speech data, a phonetic pronunciation dictionary or lexicon which spans all words in the language, and a text collection on the order of several million words. Moreover, ASR technologies usually require years of research in order to deal with the specific idiosyncrasies of each language. This makes building a speech recognizer on a language with few resources a daunting task. In this thesis, we propose a universal ASR framework for transcription and keyword spotting (KWS) tasks that work on a variety of languages. We investigate methods to deal with the need of a pronunciation dictionary by using a Pronunciation Mixture Model that can learn from existing lexicons and acoustic data to generate pronunciation for new words. In the case when no dictionary is available, a graphemic lexicon provides comparable performance to the expert lexicon. To alleviate the need for text corpora, we investigate the use of subwords and web data which helps im- prove KWS spotting results. Finally, we reduce the need for speech recordings by using bottleneck (BN) features trained on multilingual corpora. We first propose the Low-rank Stacked Bottleneck architecture which improves ASR performance over previous state-of-the-art systems. We then investigate a method to select data from various languages that is most similar to the target language in a data-driven manner, which helps improve the eectiveness of the BN features. Using techniques described and proposed in this thesis, we are able to more than double the KWS performance for a low-resource language compared to using standard techniques geared towards rich resource domains. by Ekapol Chuangsuwanich. Ph. D. 2016-12-05T19:11:04Z 2016-12-05T19:11:04Z 2016 2016 Thesis http://hdl.handle.net/1721.1/105571 963858647 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 143 pages application/pdf Massachusetts Institute of Technology
spellingShingle	Electrical Engineering and Computer Science. Chuangsuwanich, Ekapol Multilingual techniques for low resource automatic speech recognition
title	Multilingual techniques for low resource automatic speech recognition
title_full	Multilingual techniques for low resource automatic speech recognition
title_fullStr	Multilingual techniques for low resource automatic speech recognition
title_full_unstemmed	Multilingual techniques for low resource automatic speech recognition
title_short	Multilingual techniques for low resource automatic speech recognition
title_sort	multilingual techniques for low resource automatic speech recognition
topic	Electrical Engineering and Computer Science.
url	http://hdl.handle.net/1721.1/105571
work_keys_str_mv	AT chuangsuwanichekapol multilingualtechniquesforlowresourceautomaticspeechrecognition

Multilingual techniques for low resource automatic speech recognition

Similar Items