Combining linguistics and statistics for high-quality limited domain English-Chinese machine translation

Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.

Bibliographic Details
Main Author: Xu, Yushi, Ph. D. Massachusetts Institute of Technology
Other Authors: Stephanie Seneff.
Format: Thesis
Language:eng
Published: Massachusetts Institute of Technology 2009
Subjects:
Online Access:http://hdl.handle.net/1721.1/44726
_version_ 1811078373284249600
author Xu, Yushi, Ph. D. Massachusetts Institute of Technology
author2 Stephanie Seneff.
author_facet Stephanie Seneff.
Xu, Yushi, Ph. D. Massachusetts Institute of Technology
author_sort Xu, Yushi, Ph. D. Massachusetts Institute of Technology
collection MIT
description Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.
first_indexed 2024-09-23T10:58:29Z
format Thesis
id mit-1721.1/44726
institution Massachusetts Institute of Technology
language eng
last_indexed 2024-09-23T10:58:29Z
publishDate 2009
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/447262019-04-12T20:28:06Z Combining linguistics and statistics for high-quality limited domain English-Chinese machine translation Xu, Yushi, Ph. D. Massachusetts Institute of Technology Stephanie Seneff. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008. Includes bibliographical references (p. 86-87). Second language learning is a compelling activity in today's global markets. This thesis focuses on critical technology necessary to produce a computer spoken translation game for learning Mandarin Chinese in a relatively broad travel domain. Three main aspects are addressed: efficient Chinese parsing, high-quality English-Chinese machine translation, and how these technologies can be integrated into a translation game system. In the language understanding component, the TINA parser is enhanced with bottom-up and long distance constraint features. The results showed that with these features, the Chinese grammar ran ten times faster and covered 15% more of the test set. In the machine translation component, a combined method of linguistic and statistical system is introduced. The English-Chinese translation is done via an intermediate language "Zhonglish", where the English-Zhonglish translation is accomplished by a parse-and-paraphrase paradigm using hand-coded rules, mainly for structural reconstruction. Zhonglish-Chinese translation is accomplished by a standard phrase based statistical machine translation system, mostly accomplishing word sense disambiguation and lexicon mapping. We evaluated in an independent test set in IWSLT travel domain spoken language corpus. Substantial improvements were achieved for GIZA alignment crossover: we obtained a 45% decrease in crossovers compared to a traditional phrase-based statistical MT system. Furthermore, the BLEU score improved by 2 points. Finally, a framework of the translation game system is described, and the feasibility of integrating the components to produce reference translation and to automatically assess student's translation is verified. by Yushi Xu. S.M. 2009-03-16T19:35:02Z 2009-03-16T19:35:02Z 2008 2008 Thesis http://hdl.handle.net/1721.1/44726 298124776 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 93 p. application/pdf Massachusetts Institute of Technology
spellingShingle Electrical Engineering and Computer Science.
Xu, Yushi, Ph. D. Massachusetts Institute of Technology
Combining linguistics and statistics for high-quality limited domain English-Chinese machine translation
title Combining linguistics and statistics for high-quality limited domain English-Chinese machine translation
title_full Combining linguistics and statistics for high-quality limited domain English-Chinese machine translation
title_fullStr Combining linguistics and statistics for high-quality limited domain English-Chinese machine translation
title_full_unstemmed Combining linguistics and statistics for high-quality limited domain English-Chinese machine translation
title_short Combining linguistics and statistics for high-quality limited domain English-Chinese machine translation
title_sort combining linguistics and statistics for high quality limited domain english chinese machine translation
topic Electrical Engineering and Computer Science.
url http://hdl.handle.net/1721.1/44726
work_keys_str_mv AT xuyushiphdmassachusettsinstituteoftechnology combininglinguisticsandstatisticsforhighqualitylimiteddomainenglishchinesemachinetranslation