OldSlavNet: A scalable Early Slavic dependency parser trained on modern language data
Historical languages are increasingly being modelled computationally. Syntactically annotated texts are often a sine-qua-non in their modelling, but parsing of pre-modern language varieties faces great data sparsity, intensified by high levels of orthographic variation. In this paper we present a go...
Main Authors: | , |
---|---|
Format: | Journal article |
Language: | English |
Published: |
Elsevier
2021
|
_version_ | 1797064884622458880 |
---|---|
author | Pedrazzini, N Eckhoff, HM |
author_facet | Pedrazzini, N Eckhoff, HM |
author_sort | Pedrazzini, N |
collection | OXFORD |
description | Historical languages are increasingly being modelled computationally. Syntactically annotated texts are often a sine-qua-non in their modelling, but parsing of pre-modern language varieties faces great data sparsity, intensified by high levels of orthographic variation. In this paper we present a good-quality Early Slavic dependency parser, attained via manipulation of modern Slavic data to resemble the orthography and morphosyntax of pre-modern varieties. The tool can be deployed to expand historical treebanks, which are crucial for data collection and quantification, and beneficial to downstream NLP tasks and historical text mining.
|
first_indexed | 2024-03-06T21:20:42Z |
format | Journal article |
id | oxford-uuid:415742e4-2c4e-4937-8947-fc37b35b496b |
institution | University of Oxford |
language | English |
last_indexed | 2024-03-06T21:20:42Z |
publishDate | 2021 |
publisher | Elsevier |
record_format | dspace |
spelling | oxford-uuid:415742e4-2c4e-4937-8947-fc37b35b496b2022-03-26T14:43:12ZOldSlavNet: A scalable Early Slavic dependency parser trained on modern language dataJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:415742e4-2c4e-4937-8947-fc37b35b496bEnglishSymplectic ElementsElsevier2021Pedrazzini, NEckhoff, HMHistorical languages are increasingly being modelled computationally. Syntactically annotated texts are often a sine-qua-non in their modelling, but parsing of pre-modern language varieties faces great data sparsity, intensified by high levels of orthographic variation. In this paper we present a good-quality Early Slavic dependency parser, attained via manipulation of modern Slavic data to resemble the orthography and morphosyntax of pre-modern varieties. The tool can be deployed to expand historical treebanks, which are crucial for data collection and quantification, and beneficial to downstream NLP tasks and historical text mining. |
spellingShingle | Pedrazzini, N Eckhoff, HM OldSlavNet: A scalable Early Slavic dependency parser trained on modern language data |
title | OldSlavNet: A scalable Early Slavic dependency parser trained on modern language data |
title_full | OldSlavNet: A scalable Early Slavic dependency parser trained on modern language data |
title_fullStr | OldSlavNet: A scalable Early Slavic dependency parser trained on modern language data |
title_full_unstemmed | OldSlavNet: A scalable Early Slavic dependency parser trained on modern language data |
title_short | OldSlavNet: A scalable Early Slavic dependency parser trained on modern language data |
title_sort | oldslavnet a scalable early slavic dependency parser trained on modern language data |
work_keys_str_mv | AT pedrazzinin oldslavnetascalableearlyslavicdependencyparsertrainedonmodernlanguagedata AT eckhoffhm oldslavnetascalableearlyslavicdependencyparsertrainedonmodernlanguagedata |