Exploiting cross-dialectal gold syntax for low-resource historical languages: towards a generic parser for pre-modern Slavic

This paper explores the possibility of improving the performance of specialized parsers for premodern Slavic by training them on data from different related varieties. Because of their linguistic heterogeneity, pre-modern Slavic varieties are treated as low-resource historical languages, whereby cro...

Full description

Bibliographic Details
Main Author: Pedrazzini, N
Other Authors: Karsdorp, F
Format: Conference item
Language:English
Published: CEUR Workshop Proceedings 2020
_version_ 1797086737281843200
author Pedrazzini, N
author2 Karsdorp, F
author_facet Karsdorp, F
Pedrazzini, N
author_sort Pedrazzini, N
collection OXFORD
description This paper explores the possibility of improving the performance of specialized parsers for premodern Slavic by training them on data from different related varieties. Because of their linguistic heterogeneity, pre-modern Slavic varieties are treated as low-resource historical languages, whereby cross-dialectal treebank data may be exploited to overcome data scarcity and attempt the training of a variety-agnostic parser. Previous experiments on early Slavic dependency parsing are discussed, particularly with regard to their ability to tackle different orthographic, regional and stylistic features. A generic pre-modern Slavic parser and two specialized parsers – one for East Slavic and one for South Slavic – are trained using jPTDP [8], a neural network model for joint part-of-speech (POS) tagging and dependency parsing which had shown promising results on a number of Universal Dependency (UD) treebanks, including Old Church Slavonic (OCS). With these experiments, a new state of the art is obtained for both OCS (83.79% unlabelled attachment score (UAS) and 78.43% labelled attachment score (LAS)) and Old East Slavic (OES) (85.7% UAS and 80.16% LAS).
first_indexed 2024-03-07T02:26:14Z
format Conference item
id oxford-uuid:a5b485f0-09f7-40f9-8843-fbdb959816a0
institution University of Oxford
language English
last_indexed 2024-03-07T02:26:14Z
publishDate 2020
publisher CEUR Workshop Proceedings
record_format dspace
spelling oxford-uuid:a5b485f0-09f7-40f9-8843-fbdb959816a02022-03-27T02:42:16ZExploiting cross-dialectal gold syntax for low-resource historical languages: towards a generic parser for pre-modern SlavicConference itemhttp://purl.org/coar/resource_type/c_5794uuid:a5b485f0-09f7-40f9-8843-fbdb959816a0EnglishSymplectic ElementsCEUR Workshop Proceedings2020Pedrazzini, NKarsdorp, FMcGillivray, BNerghes, AWevers, MThis paper explores the possibility of improving the performance of specialized parsers for premodern Slavic by training them on data from different related varieties. Because of their linguistic heterogeneity, pre-modern Slavic varieties are treated as low-resource historical languages, whereby cross-dialectal treebank data may be exploited to overcome data scarcity and attempt the training of a variety-agnostic parser. Previous experiments on early Slavic dependency parsing are discussed, particularly with regard to their ability to tackle different orthographic, regional and stylistic features. A generic pre-modern Slavic parser and two specialized parsers – one for East Slavic and one for South Slavic – are trained using jPTDP [8], a neural network model for joint part-of-speech (POS) tagging and dependency parsing which had shown promising results on a number of Universal Dependency (UD) treebanks, including Old Church Slavonic (OCS). With these experiments, a new state of the art is obtained for both OCS (83.79% unlabelled attachment score (UAS) and 78.43% labelled attachment score (LAS)) and Old East Slavic (OES) (85.7% UAS and 80.16% LAS).
spellingShingle Pedrazzini, N
Exploiting cross-dialectal gold syntax for low-resource historical languages: towards a generic parser for pre-modern Slavic
title Exploiting cross-dialectal gold syntax for low-resource historical languages: towards a generic parser for pre-modern Slavic
title_full Exploiting cross-dialectal gold syntax for low-resource historical languages: towards a generic parser for pre-modern Slavic
title_fullStr Exploiting cross-dialectal gold syntax for low-resource historical languages: towards a generic parser for pre-modern Slavic
title_full_unstemmed Exploiting cross-dialectal gold syntax for low-resource historical languages: towards a generic parser for pre-modern Slavic
title_short Exploiting cross-dialectal gold syntax for low-resource historical languages: towards a generic parser for pre-modern Slavic
title_sort exploiting cross dialectal gold syntax for low resource historical languages towards a generic parser for pre modern slavic
work_keys_str_mv AT pedrazzinin exploitingcrossdialectalgoldsyntaxforlowresourcehistoricallanguagestowardsagenericparserforpremodernslavic