Adaptor Grammars for Learning Non−Concatenative Morphology
This paper contributes an approach for expressing non-concatenative morphological phenomena, such as stem derivation in Semitic languages, in terms of a mildly context-sensitive grammar formalism. This offers a convenient level of modelling abstraction while remaining computationally tractable. The...
Main Authors: | , |
---|---|
Format: | Conference item |
Published: |
Association for Computational Linguistics
2013
|
_version_ | 1797050968994480128 |
---|---|
author | Botha, J Blunsom, P |
author_facet | Botha, J Blunsom, P |
author_sort | Botha, J |
collection | OXFORD |
description | This paper contributes an approach for expressing non-concatenative morphological phenomena, such as stem derivation in Semitic languages, in terms of a mildly context-sensitive grammar formalism. This offers a convenient level of modelling abstraction while remaining computationally tractable. The nonparametric Bayesian framework of adaptor grammars is extended to this richer grammar formalism to propose a probabilistic model that can learn word segmentation and morpheme lexicons, including ones with discontiguous strings as elements, from unannotated data. Our experiments on Hebrew and three variants of Arabic data find that the additional expressiveness to capture roots and templates as atomic units improves the quality of concatenative segmentation and stem identification. We obtain 74% accuracy in identifying triliteral Hebrew roots, while performing morphological segmentation with an F1-score of 78.1. |
first_indexed | 2024-03-06T18:13:05Z |
format | Conference item |
id | oxford-uuid:03aff8b9-93f1-47f9-bff4-706c95d9d2c7 |
institution | University of Oxford |
last_indexed | 2024-03-06T18:13:05Z |
publishDate | 2013 |
publisher | Association for Computational Linguistics |
record_format | dspace |
spelling | oxford-uuid:03aff8b9-93f1-47f9-bff4-706c95d9d2c72022-03-26T08:47:38ZAdaptor Grammars for Learning Non−Concatenative MorphologyConference itemhttp://purl.org/coar/resource_type/c_5794uuid:03aff8b9-93f1-47f9-bff4-706c95d9d2c7Department of Computer ScienceAssociation for Computational Linguistics2013Botha, JBlunsom, PThis paper contributes an approach for expressing non-concatenative morphological phenomena, such as stem derivation in Semitic languages, in terms of a mildly context-sensitive grammar formalism. This offers a convenient level of modelling abstraction while remaining computationally tractable. The nonparametric Bayesian framework of adaptor grammars is extended to this richer grammar formalism to propose a probabilistic model that can learn word segmentation and morpheme lexicons, including ones with discontiguous strings as elements, from unannotated data. Our experiments on Hebrew and three variants of Arabic data find that the additional expressiveness to capture roots and templates as atomic units improves the quality of concatenative segmentation and stem identification. We obtain 74% accuracy in identifying triliteral Hebrew roots, while performing morphological segmentation with an F1-score of 78.1. |
spellingShingle | Botha, J Blunsom, P Adaptor Grammars for Learning Non−Concatenative Morphology |
title | Adaptor Grammars for Learning Non−Concatenative Morphology |
title_full | Adaptor Grammars for Learning Non−Concatenative Morphology |
title_fullStr | Adaptor Grammars for Learning Non−Concatenative Morphology |
title_full_unstemmed | Adaptor Grammars for Learning Non−Concatenative Morphology |
title_short | Adaptor Grammars for Learning Non−Concatenative Morphology |
title_sort | adaptor grammars for learning non concatenative morphology |
work_keys_str_mv | AT bothaj adaptorgrammarsforlearningnonconcatenativemorphology AT blunsomp adaptorgrammarsforlearningnonconcatenativemorphology |