Context-dependent type-level models for unsupervised morpho-syntactic induction

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2015.

Bibliographic Details
Main Author: Lee, Yoong Keok
Other Authors: Regina Barzilay.
Format: Thesis
Language:eng
Published: Massachusetts Institute of Technology 2015
Subjects:
Online Access:http://hdl.handle.net/1721.1/97759
_version_ 1811092314849804288
author Lee, Yoong Keok
author2 Regina Barzilay.
author_facet Regina Barzilay.
Lee, Yoong Keok
author_sort Lee, Yoong Keok
collection MIT
description Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2015.
first_indexed 2024-09-23T15:16:28Z
format Thesis
id mit-1721.1/97759
institution Massachusetts Institute of Technology
language eng
last_indexed 2024-09-23T15:16:28Z
publishDate 2015
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/977592019-04-10T12:45:18Z Context-dependent type-level models for unsupervised morpho-syntactic induction Lee, Yoong Keok Regina Barzilay. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2015. This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. Cataloged from student-submitted PDF version of thesis. Includes bibliographical references (pages 126-141). This thesis improves unsupervised methods for part-of-speech (POS) induction and morphological word segmentation by modeling linguistic phenomena previously not used. For both tasks, we realize these linguistic intuitions with Bayesian generative models that first create a latent lexicon before generating unannotated tokens in the input corpus. Our POS induction model explicitly incorporates properties of POS tags at the type-level which is not parameterized by existing token-based approaches. This enables our model to outperform previous approaches on a range of languages that exhibit substantial syntactic variation. In our morphological segmentation model, we exploit the fact that axes are correlated within a word and between adjacent words. We surpass previous unsupervised segmentation systems on the Modern Standard Arabic Treebank data set. Finally, we showcase the utility of our unsupervised segmentation model for machine translation of the Levantine dialectal Arabic for which there is no known segmenter. We demonstrate that our segmenter outperforms supervised and knowledge-based alternatives. by Yoong Keok Lee. Ph. D. 2015-07-17T19:12:14Z 2015-07-17T19:12:14Z 2015 2015 Thesis http://hdl.handle.net/1721.1/97759 912300731 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 141 pages application/pdf Massachusetts Institute of Technology
spellingShingle Electrical Engineering and Computer Science.
Lee, Yoong Keok
Context-dependent type-level models for unsupervised morpho-syntactic induction
title Context-dependent type-level models for unsupervised morpho-syntactic induction
title_full Context-dependent type-level models for unsupervised morpho-syntactic induction
title_fullStr Context-dependent type-level models for unsupervised morpho-syntactic induction
title_full_unstemmed Context-dependent type-level models for unsupervised morpho-syntactic induction
title_short Context-dependent type-level models for unsupervised morpho-syntactic induction
title_sort context dependent type level models for unsupervised morpho syntactic induction
topic Electrical Engineering and Computer Science.
url http://hdl.handle.net/1721.1/97759
work_keys_str_mv AT leeyoongkeok contextdependenttypelevelmodelsforunsupervisedmorphosyntacticinduction