Context-dependent type-level models for unsupervised morpho-syntactic induction
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2015.
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Language: | eng |
Published: |
Massachusetts Institute of Technology
2015
|
Subjects: | |
Online Access: | http://hdl.handle.net/1721.1/97759 |
_version_ | 1811092314849804288 |
---|---|
author | Lee, Yoong Keok |
author2 | Regina Barzilay. |
author_facet | Regina Barzilay. Lee, Yoong Keok |
author_sort | Lee, Yoong Keok |
collection | MIT |
description | Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2015. |
first_indexed | 2024-09-23T15:16:28Z |
format | Thesis |
id | mit-1721.1/97759 |
institution | Massachusetts Institute of Technology |
language | eng |
last_indexed | 2024-09-23T15:16:28Z |
publishDate | 2015 |
publisher | Massachusetts Institute of Technology |
record_format | dspace |
spelling | mit-1721.1/977592019-04-10T12:45:18Z Context-dependent type-level models for unsupervised morpho-syntactic induction Lee, Yoong Keok Regina Barzilay. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2015. This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. Cataloged from student-submitted PDF version of thesis. Includes bibliographical references (pages 126-141). This thesis improves unsupervised methods for part-of-speech (POS) induction and morphological word segmentation by modeling linguistic phenomena previously not used. For both tasks, we realize these linguistic intuitions with Bayesian generative models that first create a latent lexicon before generating unannotated tokens in the input corpus. Our POS induction model explicitly incorporates properties of POS tags at the type-level which is not parameterized by existing token-based approaches. This enables our model to outperform previous approaches on a range of languages that exhibit substantial syntactic variation. In our morphological segmentation model, we exploit the fact that axes are correlated within a word and between adjacent words. We surpass previous unsupervised segmentation systems on the Modern Standard Arabic Treebank data set. Finally, we showcase the utility of our unsupervised segmentation model for machine translation of the Levantine dialectal Arabic for which there is no known segmenter. We demonstrate that our segmenter outperforms supervised and knowledge-based alternatives. by Yoong Keok Lee. Ph. D. 2015-07-17T19:12:14Z 2015-07-17T19:12:14Z 2015 2015 Thesis http://hdl.handle.net/1721.1/97759 912300731 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 141 pages application/pdf Massachusetts Institute of Technology |
spellingShingle | Electrical Engineering and Computer Science. Lee, Yoong Keok Context-dependent type-level models for unsupervised morpho-syntactic induction |
title | Context-dependent type-level models for unsupervised morpho-syntactic induction |
title_full | Context-dependent type-level models for unsupervised morpho-syntactic induction |
title_fullStr | Context-dependent type-level models for unsupervised morpho-syntactic induction |
title_full_unstemmed | Context-dependent type-level models for unsupervised morpho-syntactic induction |
title_short | Context-dependent type-level models for unsupervised morpho-syntactic induction |
title_sort | context dependent type level models for unsupervised morpho syntactic induction |
topic | Electrical Engineering and Computer Science. |
url | http://hdl.handle.net/1721.1/97759 |
work_keys_str_mv | AT leeyoongkeok contextdependenttypelevelmodelsforunsupervisedmorphosyntacticinduction |