Exploiting cross-dialectal gold syntax for low-resource historical languages: towards a generic parser for pre-modern Slavic

This paper explores the possibility of improving the performance of specialized parsers for premodern Slavic by training them on data from different related varieties. Because of their linguistic heterogeneity, pre-modern Slavic varieties are treated as low-resource historical languages, whereby cro...

Full description

Bibliographic Details
Main Author:	Pedrazzini, N
Other Authors:	Karsdorp, F
Format:	Conference item
Language:	English
Published:	CEUR Workshop Proceedings 2020

_version_	1826289263912681472
author	Pedrazzini, N
author2	Karsdorp, F
author_facet	Karsdorp, F Pedrazzini, N
author_sort	Pedrazzini, N
collection	OXFORD
description	This paper explores the possibility of improving the performance of specialized parsers for premodern Slavic by training them on data from different related varieties. Because of their linguistic heterogeneity, pre-modern Slavic varieties are treated as low-resource historical languages, whereby cross-dialectal treebank data may be exploited to overcome data scarcity and attempt the training of a variety-agnostic parser. Previous experiments on early Slavic dependency parsing are discussed, particularly with regard to their ability to tackle different orthographic, regional and stylistic features. A generic pre-modern Slavic parser and two specialized parsers – one for East Slavic and one for South Slavic – are trained using jPTDP [8], a neural network model for joint part-of-speech (POS) tagging and dependency parsing which had shown promising results on a number of Universal Dependency (UD) treebanks, including Old Church Slavonic (OCS). With these experiments, a new state of the art is obtained for both OCS (83.79% unlabelled attachment score (UAS) and 78.43% labelled attachment score (LAS)) and Old East Slavic (OES) (85.7% UAS and 80.16% LAS).
first_indexed	2024-03-07T02:26:14Z
format	Conference item
id	oxford-uuid:a5b485f0-09f7-40f9-8843-fbdb959816a0
institution	University of Oxford
language	English
last_indexed	2024-03-07T02:26:14Z
publishDate	2020
publisher	CEUR Workshop Proceedings
record_format	dspace
spelling	oxford-uuid:a5b485f0-09f7-40f9-8843-fbdb959816a02022-03-27T02:42:16ZExploiting cross-dialectal gold syntax for low-resource historical languages: towards a generic parser for pre-modern SlavicConference itemhttp://purl.org/coar/resource_type/c_5794uuid:a5b485f0-09f7-40f9-8843-fbdb959816a0EnglishSymplectic ElementsCEUR Workshop Proceedings2020Pedrazzini, NKarsdorp, FMcGillivray, BNerghes, AWevers, MThis paper explores the possibility of improving the performance of specialized parsers for premodern Slavic by training them on data from different related varieties. Because of their linguistic heterogeneity, pre-modern Slavic varieties are treated as low-resource historical languages, whereby cross-dialectal treebank data may be exploited to overcome data scarcity and attempt the training of a variety-agnostic parser. Previous experiments on early Slavic dependency parsing are discussed, particularly with regard to their ability to tackle different orthographic, regional and stylistic features. A generic pre-modern Slavic parser and two specialized parsers – one for East Slavic and one for South Slavic – are trained using jPTDP [8], a neural network model for joint part-of-speech (POS) tagging and dependency parsing which had shown promising results on a number of Universal Dependency (UD) treebanks, including Old Church Slavonic (OCS). With these experiments, a new state of the art is obtained for both OCS (83.79% unlabelled attachment score (UAS) and 78.43% labelled attachment score (LAS)) and Old East Slavic (OES) (85.7% UAS and 80.16% LAS).
spellingShingle	Pedrazzini, N Exploiting cross-dialectal gold syntax for low-resource historical languages: towards a generic parser for pre-modern Slavic
title	Exploiting cross-dialectal gold syntax for low-resource historical languages: towards a generic parser for pre-modern Slavic
title_full	Exploiting cross-dialectal gold syntax for low-resource historical languages: towards a generic parser for pre-modern Slavic
title_fullStr	Exploiting cross-dialectal gold syntax for low-resource historical languages: towards a generic parser for pre-modern Slavic
title_full_unstemmed	Exploiting cross-dialectal gold syntax for low-resource historical languages: towards a generic parser for pre-modern Slavic
title_short	Exploiting cross-dialectal gold syntax for low-resource historical languages: towards a generic parser for pre-modern Slavic
title_sort	exploiting cross dialectal gold syntax for low resource historical languages towards a generic parser for pre modern slavic
work_keys_str_mv	AT pedrazzinin exploitingcrossdialectalgoldsyntaxforlowresourcehistoricallanguagestowardsagenericparserforpremodernslavic

Exploiting cross-dialectal gold syntax for low-resource historical languages: towards a generic parser for pre-modern Slavic

Similar Items