Nucleus Composition in Transition-based Dependency Parsing

Dependency-based approaches to syntactic analysis assume that syntactic structure can be analyzed in terms of binary asymmetric dependency relations holding between elementary syntactic units. Computational models for dependency parsing almost universally assume that an elementary syntactic unit is...

Full description

Bibliographic Details
Main Authors:	Joakim Nivre, Ali Basirat, Luise Dürlich, Adam Moss
Format:	Article
Language:	English
Published:	The MIT Press 2022-07-01
Series:	Computational Linguistics
Online Access:	http://dx.doi.org/10.1162/coli_a_00450

_version_	1827916642826321920
author	Joakim Nivre Ali Basirat Luise Dürlich Adam Moss
author_facet	Joakim Nivre Ali Basirat Luise Dürlich Adam Moss
author_sort	Joakim Nivre
collection	DOAJ
description	Dependency-based approaches to syntactic analysis assume that syntactic structure can be analyzed in terms of binary asymmetric dependency relations holding between elementary syntactic units. Computational models for dependency parsing almost universally assume that an elementary syntactic unit is a word, while the influential theory of Lucien Tesnière instead posits a more abstract notion of nucleus, which may be realized as one or more words. In this article, we investigate the effect of enriching computational parsing models with a concept of nucleus inspired by Tesnière. We begin by reviewing how the concept of nucleus can be defined in the framework of Universal Dependencies, which has become the de facto standard for training and evaluating supervised dependency parsers, and explaining how composition functions can be used to make neural transition-based dependency parsers aware of the nuclei thus defined. We then perform an extensive experimental study, using data from 20 languages to assess the impact of nucleus composition across languages with different typological characteristics, and utilizing a variety of analytical tools including ablation, linear mixed-effects models, diagnostic classifiers, and dimensionality reduction. The analysis reveals that nucleus composition gives small but consistent improvements in parsing accuracy for most languages, and that the improvement mainly concerns the analysis of main predicates, nominal dependents, clausal dependents, and coordination structures. Significant factors explaining the rate of improvement across languages include entropy in coordination structures and frequency of certain function words, in particular determiners. Analysis using dimensionality reduction and diagnostic classifiers suggests that nucleus composition increases the similarity of vectors representing nuclei of the same syntactic type.
first_indexed	2024-03-13T03:18:12Z
format	Article
id	doaj.art-70d276251bea45fb92e82c1f9106a649
institution	Directory Open Access Journal
issn	1530-9312
language	English
last_indexed	2024-03-13T03:18:12Z
publishDate	2022-07-01
publisher	The MIT Press
record_format	Article
series	Computational Linguistics
spelling	doaj.art-70d276251bea45fb92e82c1f9106a6492023-06-25T14:50:05ZengThe MIT PressComputational Linguistics1530-93122022-07-0148410.1162/coli_a_00450Nucleus Composition in Transition-based Dependency ParsingJoakim NivreAli BasiratLuise DürlichAdam MossDependency-based approaches to syntactic analysis assume that syntactic structure can be analyzed in terms of binary asymmetric dependency relations holding between elementary syntactic units. Computational models for dependency parsing almost universally assume that an elementary syntactic unit is a word, while the influential theory of Lucien Tesnière instead posits a more abstract notion of nucleus, which may be realized as one or more words. In this article, we investigate the effect of enriching computational parsing models with a concept of nucleus inspired by Tesnière. We begin by reviewing how the concept of nucleus can be defined in the framework of Universal Dependencies, which has become the de facto standard for training and evaluating supervised dependency parsers, and explaining how composition functions can be used to make neural transition-based dependency parsers aware of the nuclei thus defined. We then perform an extensive experimental study, using data from 20 languages to assess the impact of nucleus composition across languages with different typological characteristics, and utilizing a variety of analytical tools including ablation, linear mixed-effects models, diagnostic classifiers, and dimensionality reduction. The analysis reveals that nucleus composition gives small but consistent improvements in parsing accuracy for most languages, and that the improvement mainly concerns the analysis of main predicates, nominal dependents, clausal dependents, and coordination structures. Significant factors explaining the rate of improvement across languages include entropy in coordination structures and frequency of certain function words, in particular determiners. Analysis using dimensionality reduction and diagnostic classifiers suggests that nucleus composition increases the similarity of vectors representing nuclei of the same syntactic type.http://dx.doi.org/10.1162/coli_a_00450
spellingShingle	Joakim Nivre Ali Basirat Luise Dürlich Adam Moss Nucleus Composition in Transition-based Dependency Parsing Computational Linguistics
title	Nucleus Composition in Transition-based Dependency Parsing
title_full	Nucleus Composition in Transition-based Dependency Parsing
title_fullStr	Nucleus Composition in Transition-based Dependency Parsing
title_full_unstemmed	Nucleus Composition in Transition-based Dependency Parsing
title_short	Nucleus Composition in Transition-based Dependency Parsing
title_sort	nucleus composition in transition based dependency parsing
url	http://dx.doi.org/10.1162/coli_a_00450
work_keys_str_mv	AT joakimnivre nucleuscompositionintransitionbaseddependencyparsing AT alibasirat nucleuscompositionintransitionbaseddependencyparsing AT luisedurlich nucleuscompositionintransitionbaseddependencyparsing AT adammoss nucleuscompositionintransitionbaseddependencyparsing

Nucleus Composition in Transition-based Dependency Parsing

Similar Items