An unsupervised method for uncovering morphological chains

Most state-of-the-art systems today produce morphological analysis based only on orthographic patterns. In contrast, we propose a model for unsupervised morphological analysis that integrates orthographic and semantic views of words. We model word formation in terms of morphological chains, from bas...

Full description

Bibliographic Details
Main Authors: Narasimhan, Karthik Rajagopal, Barzilay, Regina, Jaakkola, Tommi S.
Other Authors: Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Format: Article
Language:en_US
Published: Association for Computational Linguistics 2015
Online Access:http://hdl.handle.net/1721.1/100399
https://orcid.org/0000-0002-2921-8201
https://orcid.org/0000-0002-2199-0379
https://orcid.org/0000-0001-9894-9983
Description
Summary:Most state-of-the-art systems today produce morphological analysis based only on orthographic patterns. In contrast, we propose a model for unsupervised morphological analysis that integrates orthographic and semantic views of words. We model word formation in terms of morphological chains, from base words to the observed words, breaking the chains into parent-child relations. We use log-linear models with morpheme and word-level features to predict possible parents, including their modifications, for each word. The limited set of candidate parents for each word render contrastive estimation feasible. Our model consistently matches or outperforms five state-of-the-art systems on Arabic, English and Turkish.