One question, different annotation depths: A case study in Early Slavic

This paper addresses some of the challenges of carrying out corpus-based linguistic analyses on historical corpora of different sizes and annotation depths. Data from the TOROT Treebank is collected to carry out a case study on Early Slavic dative absolutes, showing the extent to which methodology a...

Full description

Bibliographic Details
Main Author: Pedrazzini, N
Format: Journal article
Language:English
Published: Journal of Historical Syntax 2022
_version_ 1826307926729424896
author Pedrazzini, N
author_facet Pedrazzini, N
author_sort Pedrazzini, N
collection OXFORD
description This paper addresses some of the challenges of carrying out corpus-based linguistic analyses on historical corpora of different sizes and annotation depths. Data from the TOROT Treebank is collected to carry out a case study on Early Slavic dative absolutes, showing the extent to which methodology and results may change depending on the amount of data and the levels of linguistic annotation available. The analysis indicates that deeply-annotated treebanks of limited size can be exploited to establish a solid guideline to analyze a phenomenon in shallowly-annotated corpora and even new, unannotated texts. This is particularly encouraging for historical languages, such as Early Slavic, showing very high diatopic and diachronic variation, which significantly undermines corpus-annotation automation and therefore calls for alternative strategies to counteract data scarcity.
first_indexed 2024-03-07T07:10:26Z
format Journal article
id oxford-uuid:32da1c9c-a7ac-45ed-93f6-76bca70f3bee
institution University of Oxford
language English
last_indexed 2024-03-07T07:10:26Z
publishDate 2022
publisher Journal of Historical Syntax
record_format dspace
spelling oxford-uuid:32da1c9c-a7ac-45ed-93f6-76bca70f3bee2022-06-28T09:26:56ZOne question, different annotation depths: A case study in Early SlavicJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:32da1c9c-a7ac-45ed-93f6-76bca70f3beeEnglishSymplectic Elements Journal of Historical Syntax2022Pedrazzini, NThis paper addresses some of the challenges of carrying out corpus-based linguistic analyses on historical corpora of different sizes and annotation depths. Data from the TOROT Treebank is collected to carry out a case study on Early Slavic dative absolutes, showing the extent to which methodology and results may change depending on the amount of data and the levels of linguistic annotation available. The analysis indicates that deeply-annotated treebanks of limited size can be exploited to establish a solid guideline to analyze a phenomenon in shallowly-annotated corpora and even new, unannotated texts. This is particularly encouraging for historical languages, such as Early Slavic, showing very high diatopic and diachronic variation, which significantly undermines corpus-annotation automation and therefore calls for alternative strategies to counteract data scarcity.
spellingShingle Pedrazzini, N
One question, different annotation depths: A case study in Early Slavic
title One question, different annotation depths: A case study in Early Slavic
title_full One question, different annotation depths: A case study in Early Slavic
title_fullStr One question, different annotation depths: A case study in Early Slavic
title_full_unstemmed One question, different annotation depths: A case study in Early Slavic
title_short One question, different annotation depths: A case study in Early Slavic
title_sort one question different annotation depths a case study in early slavic
work_keys_str_mv AT pedrazzinin onequestiondifferentannotationdepthsacasestudyinearlyslavic