Combining linguistics and statistics for high-quality limited domain English-Chinese machine translation
Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Language: | eng |
Published: |
Massachusetts Institute of Technology
2009
|
Subjects: | |
Online Access: | http://hdl.handle.net/1721.1/44726 |
_version_ | 1811078373284249600 |
---|---|
author | Xu, Yushi, Ph. D. Massachusetts Institute of Technology |
author2 | Stephanie Seneff. |
author_facet | Stephanie Seneff. Xu, Yushi, Ph. D. Massachusetts Institute of Technology |
author_sort | Xu, Yushi, Ph. D. Massachusetts Institute of Technology |
collection | MIT |
description | Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008. |
first_indexed | 2024-09-23T10:58:29Z |
format | Thesis |
id | mit-1721.1/44726 |
institution | Massachusetts Institute of Technology |
language | eng |
last_indexed | 2024-09-23T10:58:29Z |
publishDate | 2009 |
publisher | Massachusetts Institute of Technology |
record_format | dspace |
spelling | mit-1721.1/447262019-04-12T20:28:06Z Combining linguistics and statistics for high-quality limited domain English-Chinese machine translation Xu, Yushi, Ph. D. Massachusetts Institute of Technology Stephanie Seneff. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008. Includes bibliographical references (p. 86-87). Second language learning is a compelling activity in today's global markets. This thesis focuses on critical technology necessary to produce a computer spoken translation game for learning Mandarin Chinese in a relatively broad travel domain. Three main aspects are addressed: efficient Chinese parsing, high-quality English-Chinese machine translation, and how these technologies can be integrated into a translation game system. In the language understanding component, the TINA parser is enhanced with bottom-up and long distance constraint features. The results showed that with these features, the Chinese grammar ran ten times faster and covered 15% more of the test set. In the machine translation component, a combined method of linguistic and statistical system is introduced. The English-Chinese translation is done via an intermediate language "Zhonglish", where the English-Zhonglish translation is accomplished by a parse-and-paraphrase paradigm using hand-coded rules, mainly for structural reconstruction. Zhonglish-Chinese translation is accomplished by a standard phrase based statistical machine translation system, mostly accomplishing word sense disambiguation and lexicon mapping. We evaluated in an independent test set in IWSLT travel domain spoken language corpus. Substantial improvements were achieved for GIZA alignment crossover: we obtained a 45% decrease in crossovers compared to a traditional phrase-based statistical MT system. Furthermore, the BLEU score improved by 2 points. Finally, a framework of the translation game system is described, and the feasibility of integrating the components to produce reference translation and to automatically assess student's translation is verified. by Yushi Xu. S.M. 2009-03-16T19:35:02Z 2009-03-16T19:35:02Z 2008 2008 Thesis http://hdl.handle.net/1721.1/44726 298124776 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 93 p. application/pdf Massachusetts Institute of Technology |
spellingShingle | Electrical Engineering and Computer Science. Xu, Yushi, Ph. D. Massachusetts Institute of Technology Combining linguistics and statistics for high-quality limited domain English-Chinese machine translation |
title | Combining linguistics and statistics for high-quality limited domain English-Chinese machine translation |
title_full | Combining linguistics and statistics for high-quality limited domain English-Chinese machine translation |
title_fullStr | Combining linguistics and statistics for high-quality limited domain English-Chinese machine translation |
title_full_unstemmed | Combining linguistics and statistics for high-quality limited domain English-Chinese machine translation |
title_short | Combining linguistics and statistics for high-quality limited domain English-Chinese machine translation |
title_sort | combining linguistics and statistics for high quality limited domain english chinese machine translation |
topic | Electrical Engineering and Computer Science. |
url | http://hdl.handle.net/1721.1/44726 |
work_keys_str_mv | AT xuyushiphdmassachusettsinstituteoftechnology combininglinguisticsandstatisticsforhighqualitylimiteddomainenglishchinesemachinetranslation |