Unsupervised learning of lexical subclasses from phonotactics

Thesis: Ph. D. in Linguistics, Massachusetts Institute of Technology, Department of Linguistics and Philosophy, 2018.

Bibliographic Details
Main Author: Morita, Takashi, Ph. D. Massachusetts Institute of Technology
Other Authors: Adam Albright.
Format: Thesis
Language:eng
Published: Massachusetts Institute of Technology 2019
Subjects:
Online Access:http://hdl.handle.net/1721.1/120612
_version_ 1826208516561436672
author Morita, Takashi, Ph. D. Massachusetts Institute of Technology
author2 Adam Albright.
author_facet Adam Albright.
Morita, Takashi, Ph. D. Massachusetts Institute of Technology
author_sort Morita, Takashi, Ph. D. Massachusetts Institute of Technology
collection MIT
description Thesis: Ph. D. in Linguistics, Massachusetts Institute of Technology, Department of Linguistics and Philosophy, 2018.
first_indexed 2024-09-23T14:06:53Z
format Thesis
id mit-1721.1/120612
institution Massachusetts Institute of Technology
language eng
last_indexed 2024-09-23T14:06:53Z
publishDate 2019
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1206122019-04-12T07:40:15Z Unsupervised learning of lexical subclasses from phonotactics Morita, Takashi, Ph. D. Massachusetts Institute of Technology Adam Albright. Massachusetts Institute of Technology. Department of Linguistics and Philosophy. Massachusetts Institute of Technology. Department of Linguistics and Philosophy. Linguistics and Philosophy. Thesis: Ph. D. in Linguistics, Massachusetts Institute of Technology, Department of Linguistics and Philosophy, 2018. This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. Cataloged from student-submitted PDF version of thesis. Includes bibliographical references (pages 203-215). Languages are constantly borrowing words from one another. Since the donor and recipient languages typically differ in their phonology and phonotactics, the native words and the loanwords of the borrower language can also exhibit dierent phonology/ phonotactics. Accordingly, it has been proposed that the phonotactics of languages such as Japanese is better explained if words are classified into etymologically defined sublexica. However, this sublexical analysis is challenged by a learnability problem: the sublexical membership of words is not directly observable. This study applies a state-of-the-art clustering method (a Dirichlet process mixture model) to a substantial number of Japanese and English words extracted from corpora. It turns out that the predicted clusters largely correspond to the etymologically defined sublexica. Since the clustering method is domain-general and not specialized to sublexicon identication, the results can be taken as statistical evidence for the heterogeneous lexica of the two languages. Moreover, the unsupervised nature of the clustering method demonstrates the learnability of sublexica from naturalistic data. The learned sublexica also replicate linguistic characterizations of actual sublexica proposed in previous literature, such as the biased distribution of (certain substrings of) segments to particular sublexica. In addition, the learned sublexica make informative predictions based on previous experimental studies. These results suggest that the predicted sublexica are linguistically sound. Finally, the predicted sublexica reveal hitherto unnoticed phonotactic properties. These discoveries can be used for further investigation of native speakers' knowledge. by Takashi Morita. Ph. D. in Linguistics 2019-03-01T19:34:06Z 2019-03-01T19:34:06Z 2018 2018 Thesis http://hdl.handle.net/1721.1/120612 1088558202 eng MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission. http://dspace.mit.edu/handle/1721.1/7582 215 pages application/pdf Massachusetts Institute of Technology
spellingShingle Linguistics and Philosophy.
Morita, Takashi, Ph. D. Massachusetts Institute of Technology
Unsupervised learning of lexical subclasses from phonotactics
title Unsupervised learning of lexical subclasses from phonotactics
title_full Unsupervised learning of lexical subclasses from phonotactics
title_fullStr Unsupervised learning of lexical subclasses from phonotactics
title_full_unstemmed Unsupervised learning of lexical subclasses from phonotactics
title_short Unsupervised learning of lexical subclasses from phonotactics
title_sort unsupervised learning of lexical subclasses from phonotactics
topic Linguistics and Philosophy.
url http://hdl.handle.net/1721.1/120612
work_keys_str_mv AT moritatakashiphdmassachusettsinstituteoftechnology unsupervisedlearningoflexicalsubclassesfromphonotactics