Adaptor Grammars for Learning Non−Concatenative Morphology

This paper contributes an approach for expressing non-concatenative morphological phenomena, such as stem derivation in Semitic languages, in terms of a mildly context-sensitive grammar formalism. This offers a convenient level of modelling abstraction while remaining computationally tractable. The...

Full description

Bibliographic Details
Main Authors:	Botha, J, Blunsom, P
Format:	Conference item
Published:	Association for Computational Linguistics 2013

_version_	1797050968994480128
author	Botha, J Blunsom, P
author_facet	Botha, J Blunsom, P
author_sort	Botha, J
collection	OXFORD
description	This paper contributes an approach for expressing non-concatenative morphological phenomena, such as stem derivation in Semitic languages, in terms of a mildly context-sensitive grammar formalism. This offers a convenient level of modelling abstraction while remaining computationally tractable. The nonparametric Bayesian framework of adaptor grammars is extended to this richer grammar formalism to propose a probabilistic model that can learn word segmentation and morpheme lexicons, including ones with discontiguous strings as elements, from unannotated data. Our experiments on Hebrew and three variants of Arabic data ﬁnd that the additional expressiveness to capture roots and templates as atomic units improves the quality of concatenative segmentation and stem identiﬁcation. We obtain 74% accuracy in identifying triliteral Hebrew roots, while performing morphological segmentation with an F1-score of 78.1.
first_indexed	2024-03-06T18:13:05Z
format	Conference item
id	oxford-uuid:03aff8b9-93f1-47f9-bff4-706c95d9d2c7
institution	University of Oxford
last_indexed	2024-03-06T18:13:05Z
publishDate	2013
publisher	Association for Computational Linguistics
record_format	dspace
spelling	oxford-uuid:03aff8b9-93f1-47f9-bff4-706c95d9d2c72022-03-26T08:47:38ZAdaptor Grammars for Learning Non−Concatenative MorphologyConference itemhttp://purl.org/coar/resource_type/c_5794uuid:03aff8b9-93f1-47f9-bff4-706c95d9d2c7Department of Computer ScienceAssociation for Computational Linguistics2013Botha, JBlunsom, PThis paper contributes an approach for expressing non-concatenative morphological phenomena, such as stem derivation in Semitic languages, in terms of a mildly context-sensitive grammar formalism. This offers a convenient level of modelling abstraction while remaining computationally tractable. The nonparametric Bayesian framework of adaptor grammars is extended to this richer grammar formalism to propose a probabilistic model that can learn word segmentation and morpheme lexicons, including ones with discontiguous strings as elements, from unannotated data. Our experiments on Hebrew and three variants of Arabic data ﬁnd that the additional expressiveness to capture roots and templates as atomic units improves the quality of concatenative segmentation and stem identiﬁcation. We obtain 74% accuracy in identifying triliteral Hebrew roots, while performing morphological segmentation with an F1-score of 78.1.
spellingShingle	Botha, J Blunsom, P Adaptor Grammars for Learning Non−Concatenative Morphology
title	Adaptor Grammars for Learning Non−Concatenative Morphology
title_full	Adaptor Grammars for Learning Non−Concatenative Morphology
title_fullStr	Adaptor Grammars for Learning Non−Concatenative Morphology
title_full_unstemmed	Adaptor Grammars for Learning Non−Concatenative Morphology
title_short	Adaptor Grammars for Learning Non−Concatenative Morphology
title_sort	adaptor grammars for learning non concatenative morphology
work_keys_str_mv	AT bothaj adaptorgrammarsforlearningnonconcatenativemorphology AT blunsomp adaptorgrammarsforlearningnonconcatenativemorphology

Adaptor Grammars for Learning Non−Concatenative Morphology

Similar Items