OldSlavNet: A scalable Early Slavic dependency parser trained on modern language data

Historical languages are increasingly being modelled computationally. Syntactically annotated texts are often a sine-qua-non in their modelling, but parsing of pre-modern language varieties faces great data sparsity, intensified by high levels of orthographic variation. In this paper we present a go...

Full description

Bibliographic Details
Main Authors: Pedrazzini, N, Eckhoff, HM
Format: Journal article
Language:English
Published: Elsevier 2021
_version_ 1797064884622458880
author Pedrazzini, N
Eckhoff, HM
author_facet Pedrazzini, N
Eckhoff, HM
author_sort Pedrazzini, N
collection OXFORD
description Historical languages are increasingly being modelled computationally. Syntactically annotated texts are often a sine-qua-non in their modelling, but parsing of pre-modern language varieties faces great data sparsity, intensified by high levels of orthographic variation. In this paper we present a good-quality Early Slavic dependency parser, attained via manipulation of modern Slavic data to resemble the orthography and morphosyntax of pre-modern varieties. The tool can be deployed to expand historical treebanks, which are crucial for data collection and quantification, and beneficial to downstream NLP tasks and historical text mining.
first_indexed 2024-03-06T21:20:42Z
format Journal article
id oxford-uuid:415742e4-2c4e-4937-8947-fc37b35b496b
institution University of Oxford
language English
last_indexed 2024-03-06T21:20:42Z
publishDate 2021
publisher Elsevier
record_format dspace
spelling oxford-uuid:415742e4-2c4e-4937-8947-fc37b35b496b2022-03-26T14:43:12ZOldSlavNet: A scalable Early Slavic dependency parser trained on modern language dataJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:415742e4-2c4e-4937-8947-fc37b35b496bEnglishSymplectic ElementsElsevier2021Pedrazzini, NEckhoff, HMHistorical languages are increasingly being modelled computationally. Syntactically annotated texts are often a sine-qua-non in their modelling, but parsing of pre-modern language varieties faces great data sparsity, intensified by high levels of orthographic variation. In this paper we present a good-quality Early Slavic dependency parser, attained via manipulation of modern Slavic data to resemble the orthography and morphosyntax of pre-modern varieties. The tool can be deployed to expand historical treebanks, which are crucial for data collection and quantification, and beneficial to downstream NLP tasks and historical text mining.
spellingShingle Pedrazzini, N
Eckhoff, HM
OldSlavNet: A scalable Early Slavic dependency parser trained on modern language data
title OldSlavNet: A scalable Early Slavic dependency parser trained on modern language data
title_full OldSlavNet: A scalable Early Slavic dependency parser trained on modern language data
title_fullStr OldSlavNet: A scalable Early Slavic dependency parser trained on modern language data
title_full_unstemmed OldSlavNet: A scalable Early Slavic dependency parser trained on modern language data
title_short OldSlavNet: A scalable Early Slavic dependency parser trained on modern language data
title_sort oldslavnet a scalable early slavic dependency parser trained on modern language data
work_keys_str_mv AT pedrazzinin oldslavnetascalableearlyslavicdependencyparsertrainedonmodernlanguagedata
AT eckhoffhm oldslavnetascalableearlyslavicdependencyparsertrainedonmodernlanguagedata