Consonant landmark detection for speech recognition

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.

Bibliographic Details
Main Author: Park, Chi-youn, 1981-
Other Authors: Kenneth N. Stevens.
Format: Thesis
Language:eng
Published: Massachusetts Institute of Technology 2009
Subjects:
Online Access:http://hdl.handle.net/1721.1/44905
_version_ 1811096442113097728
author Park, Chi-youn, 1981-
author2 Kenneth N. Stevens.
author_facet Kenneth N. Stevens.
Park, Chi-youn, 1981-
author_sort Park, Chi-youn, 1981-
collection MIT
description Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.
first_indexed 2024-09-23T16:43:43Z
format Thesis
id mit-1721.1/44905
institution Massachusetts Institute of Technology
language eng
last_indexed 2024-09-23T16:43:43Z
publishDate 2009
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/449052019-04-10T11:14:00Z Consonant landmark detection for speech recognition Park, Chi-youn, 1981- Kenneth N. Stevens. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008. This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. Includes bibliographical references (p. 191-197). This thesis focuses on the detection of abrupt acoustic discontinuities in the speech signal, which constitute landmarks for consonant sounds. Because a large amount of phonetic information is concentrated near acoustic discontinuities, more focused speech analysis and recognition can be performed based on the landmarks. Three types of consonant landmarks are defined according to its characteristics -- glottal vibration, turbulence noise, and sonorant consonant -- so that the appropriate analysis method for each landmark point can be determined. A probabilistic knowledge-based algorithm is developed in three steps. First, landmark candidates are detected and their landmark types are classified based on changes in spectral amplitude. Next, a bigram model describing the physiologically-feasible sequences of consonant landmarks is proposed, so that the most likely landmark sequence among the candidates can be found. Finally, it has been observed that certain landmarks are ambiguous in certain sets of phonetic and prosodic contexts, while they can be reliably detected in other contexts. A method to represent the regions where the landmarks are reliably detected versus where they are ambiguous is presented. On TIMIT test set, 91% of all the consonant landmarks and 95% of obstruent landmarks are located as landmark candidates. The bigram-based process for determining the most likely landmark sequences yields 12% deletion and substitution rates and a 15% insertion rate. An alternative representation that distinguishes reliable and ambiguous regions can detect 92% of the landmarks and 40% of the landmarks are judged to be reliable. The deletion rate within reliable regions is as low as 5%. (cont.) The resulting landmark sequences form a basis for a knowledge-based speech recognition system since the landmarks imply broad phonetic classes of the speech signal and indicate the points of focus for estimating detailed phonetic information. In addition, because the reliable regions generally correspond to lexical stresses and word boundaries, it is expected that the landmarks can guide the focus of attention not only at the phoneme-level, but at the phrase-level as well. by Chiyoun Park. Ph.D. 2009-03-20T19:30:50Z 2009-03-20T19:30:50Z 2008 2008 Thesis http://hdl.handle.net/1721.1/44905 297548228 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 197 p. application/pdf Massachusetts Institute of Technology
spellingShingle Electrical Engineering and Computer Science.
Park, Chi-youn, 1981-
Consonant landmark detection for speech recognition
title Consonant landmark detection for speech recognition
title_full Consonant landmark detection for speech recognition
title_fullStr Consonant landmark detection for speech recognition
title_full_unstemmed Consonant landmark detection for speech recognition
title_short Consonant landmark detection for speech recognition
title_sort consonant landmark detection for speech recognition
topic Electrical Engineering and Computer Science.
url http://hdl.handle.net/1721.1/44905
work_keys_str_mv AT parkchiyoun1981 consonantlandmarkdetectionforspeechrecognition