OldSlavNet: A scalable Early Slavic dependency parser trained on modern language data

Historical languages are increasingly being modelled computationally. Syntactically annotated texts are often a sine-qua-non in their modelling, but parsing of pre-modern language varieties faces great data sparsity, intensified by high levels of orthographic variation. In this paper we present a go...

Full description

Bibliographic Details
Main Authors:	Pedrazzini, N, Eckhoff, HM
Format:	Journal article
Language:	English
Published:	Elsevier 2021

_version_	1797064884622458880
author	Pedrazzini, N Eckhoff, HM
author_facet	Pedrazzini, N Eckhoff, HM
author_sort	Pedrazzini, N
collection	OXFORD
description	Historical languages are increasingly being modelled computationally. Syntactically annotated texts are often a sine-qua-non in their modelling, but parsing of pre-modern language varieties faces great data sparsity, intensified by high levels of orthographic variation. In this paper we present a good-quality Early Slavic dependency parser, attained via manipulation of modern Slavic data to resemble the orthography and morphosyntax of pre-modern varieties. The tool can be deployed to expand historical treebanks, which are crucial for data collection and quantification, and beneficial to downstream NLP tasks and historical text mining.
first_indexed	2024-03-06T21:20:42Z
format	Journal article
id	oxford-uuid:415742e4-2c4e-4937-8947-fc37b35b496b
institution	University of Oxford
language	English
last_indexed	2024-03-06T21:20:42Z
publishDate	2021
publisher	Elsevier
record_format	dspace
spelling	oxford-uuid:415742e4-2c4e-4937-8947-fc37b35b496b2022-03-26T14:43:12ZOldSlavNet: A scalable Early Slavic dependency parser trained on modern language dataJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:415742e4-2c4e-4937-8947-fc37b35b496bEnglishSymplectic ElementsElsevier2021Pedrazzini, NEckhoff, HMHistorical languages are increasingly being modelled computationally. Syntactically annotated texts are often a sine-qua-non in their modelling, but parsing of pre-modern language varieties faces great data sparsity, intensified by high levels of orthographic variation. In this paper we present a good-quality Early Slavic dependency parser, attained via manipulation of modern Slavic data to resemble the orthography and morphosyntax of pre-modern varieties. The tool can be deployed to expand historical treebanks, which are crucial for data collection and quantification, and beneficial to downstream NLP tasks and historical text mining.
spellingShingle	Pedrazzini, N Eckhoff, HM OldSlavNet: A scalable Early Slavic dependency parser trained on modern language data
title	OldSlavNet: A scalable Early Slavic dependency parser trained on modern language data
title_full	OldSlavNet: A scalable Early Slavic dependency parser trained on modern language data
title_fullStr	OldSlavNet: A scalable Early Slavic dependency parser trained on modern language data
title_full_unstemmed	OldSlavNet: A scalable Early Slavic dependency parser trained on modern language data
title_short	OldSlavNet: A scalable Early Slavic dependency parser trained on modern language data
title_sort	oldslavnet a scalable early slavic dependency parser trained on modern language data
work_keys_str_mv	AT pedrazzinin oldslavnetascalableearlyslavicdependencyparsertrainedonmodernlanguagedata AT eckhoffhm oldslavnetascalableearlyslavicdependencyparsertrainedonmodernlanguagedata

OldSlavNet: A scalable Early Slavic dependency parser trained on modern language data

Similar Items