OldSlavNet: A scalable Early Slavic dependency parser trained on modern language data

Historical languages are increasingly being modelled computationally. Syntactically annotated texts are often a sine-qua-non in their modelling, but parsing of pre-modern language varieties faces great data sparsity, intensified by high levels of orthographic variation. In this paper we present a go...

Full description

Bibliographic Details
Main Authors:	Pedrazzini, N, Eckhoff, HM
Format:	Journal article
Language:	English
Published:	Elsevier 2021

Description
Summary:	Historical languages are increasingly being modelled computationally. Syntactically annotated texts are often a sine-qua-non in their modelling, but parsing of pre-modern language varieties faces great data sparsity, intensified by high levels of orthographic variation. In this paper we present a good-quality Early Slavic dependency parser, attained via manipulation of modern Slavic data to resemble the orthography and morphosyntax of pre-modern varieties. The tool can be deployed to expand historical treebanks, which are crucial for data collection and quantification, and beneficial to downstream NLP tasks and historical text mining.

OldSlavNet: A scalable Early Slavic dependency parser trained on modern language data

Similar Items