A comparison-based approach to mispronunciation detection

Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.

Bibliographic Details
Main Author: Lee, Ann, Ph. D. Massachusetts Institute of Technology
Other Authors: James Glass.
Format: Thesis
Language:eng
Published: Massachusetts Institute of Technology 2012
Subjects:
Online Access:http://hdl.handle.net/1721.1/75660
_version_ 1826196693330165760
author Lee, Ann, Ph. D. Massachusetts Institute of Technology
author2 James Glass.
author_facet James Glass.
Lee, Ann, Ph. D. Massachusetts Institute of Technology
author_sort Lee, Ann, Ph. D. Massachusetts Institute of Technology
collection MIT
description Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.
first_indexed 2024-09-23T10:35:06Z
format Thesis
id mit-1721.1/75660
institution Massachusetts Institute of Technology
language eng
last_indexed 2024-09-23T10:35:06Z
publishDate 2012
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/756602019-04-12T21:23:00Z A comparison-based approach to mispronunciation detection Lee, Ann, Ph. D. Massachusetts Institute of Technology James Glass. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012. Cataloged from PDF version of thesis. Includes bibliographical references (p. 89-92). This thesis focuses on the problem of detecting word-level mispronunciations in nonnative speech. Conventional automatic speech recognition-based mispronunciation detection systems have the disadvantage of requiring a large amount of language-specific, annotated training data. Some systems even require a speech recognizer in the target language and another one in the students' native language. To reduce human labeling effort and for generalization across all languages, we propose a comparison-based framework which only requires word-level timing information from the native training data. With the assumption that the student is trying to enunciate the given script, dynamic time warping (DTW) is carried out between a student's utterance (nonnative speech) and a teacher's utterance (native speech), and we focus on detecting mis-alignment in the warping path and the distance matrix. The first stage of the system locates word boundaries in the nonnative utterance. To handle the problem that nonnative speech often contains intra-word pauses, we run DTW with a silence model which can align the two utterances, detect and remove silences at the same time. In order to segment each word into smaller, acoustically similar, units for a finer-grained analysis, we develop a phoneme-like unit segmentor which works by segmenting the selfsimilarity matrix into low-distance regions along the diagonal. Both phone-level and wordlevel features that describe the degree of mis-alignment between the two utterances are extracted, and the problem is formulated as a classification task. SVM classifiers are trained, and three voting schemes are considered for the cases where there are more than one matching reference utterance. The system is evaluated on the Chinese University Chinese Learners of English (CUCHLOE) corpus, and the TIMIT corpus is used as the native corpus. Experimental results have shown 1) the effectiveness of the silence model in guiding DTW to capture the word boundaries in nonnative speech more accurately, 2) the complimentary performance of the word-level and the phone-level features, and 3) the stable performance of the system with or without phonetic units labeling. by Ann Lee. S.M. 2012-12-13T18:50:04Z 2012-12-13T18:50:04Z 2012 2012 Thesis http://hdl.handle.net/1721.1/75660 818652623 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 92 p. application/pdf Massachusetts Institute of Technology
spellingShingle Electrical Engineering and Computer Science.
Lee, Ann, Ph. D. Massachusetts Institute of Technology
A comparison-based approach to mispronunciation detection
title A comparison-based approach to mispronunciation detection
title_full A comparison-based approach to mispronunciation detection
title_fullStr A comparison-based approach to mispronunciation detection
title_full_unstemmed A comparison-based approach to mispronunciation detection
title_short A comparison-based approach to mispronunciation detection
title_sort comparison based approach to mispronunciation detection
topic Electrical Engineering and Computer Science.
url http://hdl.handle.net/1721.1/75660
work_keys_str_mv AT leeannphdmassachusettsinstituteoftechnology acomparisonbasedapproachtomispronunciationdetection
AT leeannphdmassachusettsinstituteoftechnology comparisonbasedapproachtomispronunciationdetection