One question, different annotation depths: A case study in Early Slavic
This paper addresses some of the challenges of carrying out corpus-based linguistic analyses on historical corpora of different sizes and annotation depths. Data from the TOROT Treebank is collected to carry out a case study on Early Slavic dative absolutes, showing the extent to which methodology a...
Main Author: | |
---|---|
Format: | Journal article |
Language: | English |
Published: |
Journal of Historical Syntax
2022
|
_version_ | 1826307926729424896 |
---|---|
author | Pedrazzini, N |
author_facet | Pedrazzini, N |
author_sort | Pedrazzini, N |
collection | OXFORD |
description | This paper addresses some of the challenges of carrying out corpus-based linguistic analyses on historical corpora of different sizes and annotation depths. Data from the TOROT Treebank is collected to carry out a case study on Early Slavic dative absolutes, showing the extent to which methodology and results may change depending on the amount of data and the levels of linguistic annotation available. The analysis indicates that deeply-annotated treebanks of limited size can be exploited to establish a solid guideline to analyze a phenomenon in shallowly-annotated corpora and even new, unannotated texts. This is particularly encouraging for historical languages, such as Early Slavic, showing very high diatopic and diachronic variation, which significantly undermines corpus-annotation automation and therefore calls for alternative strategies to counteract data scarcity.
|
first_indexed | 2024-03-07T07:10:26Z |
format | Journal article |
id | oxford-uuid:32da1c9c-a7ac-45ed-93f6-76bca70f3bee |
institution | University of Oxford |
language | English |
last_indexed | 2024-03-07T07:10:26Z |
publishDate | 2022 |
publisher | Journal of Historical Syntax |
record_format | dspace |
spelling | oxford-uuid:32da1c9c-a7ac-45ed-93f6-76bca70f3bee2022-06-28T09:26:56ZOne question, different annotation depths: A case study in Early SlavicJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:32da1c9c-a7ac-45ed-93f6-76bca70f3beeEnglishSymplectic Elements Journal of Historical Syntax2022Pedrazzini, NThis paper addresses some of the challenges of carrying out corpus-based linguistic analyses on historical corpora of different sizes and annotation depths. Data from the TOROT Treebank is collected to carry out a case study on Early Slavic dative absolutes, showing the extent to which methodology and results may change depending on the amount of data and the levels of linguistic annotation available. The analysis indicates that deeply-annotated treebanks of limited size can be exploited to establish a solid guideline to analyze a phenomenon in shallowly-annotated corpora and even new, unannotated texts. This is particularly encouraging for historical languages, such as Early Slavic, showing very high diatopic and diachronic variation, which significantly undermines corpus-annotation automation and therefore calls for alternative strategies to counteract data scarcity. |
spellingShingle | Pedrazzini, N One question, different annotation depths: A case study in Early Slavic |
title | One question, different annotation depths: A case study in Early Slavic |
title_full | One question, different annotation depths: A case study in Early Slavic |
title_fullStr | One question, different annotation depths: A case study in Early Slavic |
title_full_unstemmed | One question, different annotation depths: A case study in Early Slavic |
title_short | One question, different annotation depths: A case study in Early Slavic |
title_sort | one question different annotation depths a case study in early slavic |
work_keys_str_mv | AT pedrazzinin onequestiondifferentannotationdepthsacasestudyinearlyslavic |